Skip to main content
Version: 1.7.0


Neural Magic's DeepSparse is a CPU runtime that takes advantage of sparsity within neural networks to reduce compute.

Supporting various LLMs, computer vision, and NLP models, DeepSparse integrates into popular deep learning libraries, such as Hugging Face and Ultralytics, allowing you to leverage DeepSparse for loading and deploying sparse models with ONNX. ONNX gives the flexibility to serve your model in a framework-agnostic environment. Coupled with SparseML, our optimization library for pruning and quantizing your models, DeepSparse delivers exceptional inference performance on CPU hardware.


Review deployment, training, and software requirements to confirm DeepSparse is compatible with your use case.

Editions and Licenses

DeepSparse is available in two editions and is licensed accordingly for end users:


Guides to get you started.

Deployment Options

Ready to deploy?


Some source code, example files, and scripts included in the DeepSparse GitHub repository are licensed under the Apache License Version 2.0 as noted.


Sparsity-aware deep learning inference runtime for CPUs.