Get Started


Neural Magic’s novel algorithms enable convolutional neural networks to run on commodity CPUs – at GPU speeds and better. Data scientists no longer have to compromise on model design and input size, or deal with scarce and costly GPU resources. Neural Magic is making the power of deep learning simple, accessible, and affordable for anyone.

Neural Magic’s Deep Sparse architecture is designed to mimic, on commodity hardware, the way brains compute. It uses neural network sparsity combined with locality of communication by utilizing the CPU’s large fast caches and its very large memory.

Sparsification through pruning is a broadly studied ML technique, allowing reductions of 10x or more in the size and the theoretical compute needed to execute a neural network, without losing much accuracy. So, while a GPU runs networks faster using more FLOPs, Neural Magic runs them faster via a reduction in the necessary FLOPs.


Sparsification is the process of taking a trained deep learning model and removing redundant information from the overprecise and over-parameterized network resulting in a faster and smaller model. Techniques for sparsification are all encompassing including everything from inducing sparsity using pruning and quantization to enabling naturally occurring sparsity using activation sparsity or winograd/FFT. When implemented correctly, these techniques result in significantly more performant and smaller models with limited to no effect on the baseline metrics.

Software Components

The Deep Sparse product suite builds on top of sparsification enabling you to easily apply the techniques to your datasets and models using recipe-driven approaches. Recipes encode the directions for how to sparsify a model into a simple, easily editable format.

  • Download a sparsification recipe and sparsified model from the SparseZoo.

  • Alternatively, create a recipe for your model using Sparsify.

  • Apply your recipe with only a few lines of code using SparseML.

  • Finally, for GPU-level performance on CPUs, deploy your sparse-quantized model with the DeepSparse Engine.

Our Sparsify and SparseML tools allow us to easily reach industry leading levels of sparsity while preserving baseline accuracy, and the DeepSparse Engine’s breakthrough sparse kernels execute this computation effectively.

Full Deep Sparse Platform flow:

Supported Architectures & Frameworks

Computer Vision Applications
Sample Models YOLOs, ResNets, MobileNets, EfficientNets, Single-Shot Detectors (SSDs)
Use Cases (Domains) Image Classification, Object Detection, NLP
Frameworks ONNX, Keras, PyTorch, TensorFlow

Today, we offer support for convolutional neural network-based computer vision models, specifically classification and object detection model types such as the models in SparseZoo.

We are continuously exploring models to add to our supported model list and SparseZoo including model architectures beyond computer vision and NLP; Subscribe for updates.


PyTorch and ONNX

Sparsify and the DeepSparse Engine inputs are standardized on the ONNX format. PyTorch has native ONNX export and requires fewer steps than other supported frameworks, such as Keras or TensorFlow. If you have flexibility in frameworks, consider PyTorch to start.

Model Considerations

Dynamic shape is currently not supported; be sure to use models with fixed inputs and compile the model for a particular batch size. Dynamic shape and dynamic batch sizes are on the Neural Magic roadmap; subscribe for updates.

Model inferences are executed as a single stream by default; concurrent execution can be enabled depending on the engine execution strategy.

Try it Out

Not sure where to start? Here are several hands-on experiences you can work through, from benchmarking to deployment.

1. Benchmarking Performance

A number of pre-trained, performant deep learning models are available via our API in the SparseZoo. Included are both baseline and recalibrated models for higher performance with the DeepSparse Engine on CPUs.

2. Transfer Learn or Train from Scratch

Use our YOLOv3 with your data using transfer learning: