Neural Magic LogoNeural Magic Logo


Libraries enabling creation of sparse deep-neural networks trained on your data with just a few lines of code

Documentation Main GitHub release GitHub Contributor Covenant


SparseML is an open-source model optimization toolkit that enables you to create inference-optimized sparse models using pruning, quantization, and distillation algorithms. Models optimized with SparseML can then be exported to the ONNX and deployed with DeepSparse for GPU-class performance on CPU hardware.


SparseML enables you to create a sparse model trained on your dataset in two ways:

  • Sparse Transfer Learning enables you to fine-tune a pre-sparsified model from SparseZoo (an open-source repository of sparse models such as BERT, YOLOv5, and ResNet-50) onto your dataset, while maintaining sparsity. This pathway works just like typical fine-tuning you are used to in training CV and NLP models, and is strongly preferred for if your model architecture is availble in SparseZoo.

  • Sparsification from Scratch enables you to apply state-of-the-art pruning (like gradual magnitude pruning or OBS pruning) and quantization (like quantization aware training) algorithms to arbitrary PyTorch and Hugging Face models. This pathway requires more experimentation, but allows you to create a sparse version of any model.




Hugging Face Transformers

Ultralytics YOLOv5

Installation Requirements

This repository is tested on Python 3.7-3.10, and Linux/Debian systems.

It is recommended to install in a virtual environment to keep your system in order. Currently supported ML Frameworks are the following: torch>=1.1.0,<=1.12.1, excluding 1.10 and 1.11.

Install with pip using:

pip install sparseml

More information on installation such as optional dependencies and requirements can be found here.

Quick Tour


To enable flexibility, ease of use, and repeatability, SparseML uses a declarative interface called recipes for specifying the sparsity-related algorithms and hyperparamters that should be applied by SparseML.

Recipes are YAML-files formatted as a list of modifiers, which encode the instructions for SparseML. Example modifiers can be anything from setting the learning rate to encoding the hyperparameters of the gradual magnitude pruning algorithm. The SparseML system parses the recipes into a native format for each framework and applies the modifications to the model and training pipeline.

Python API

Because of the declarative, recipe-based approach, you can add SparseML to your existing PyTorch traing pipelines. The ScheduleModifierManager class is responsible for parsing the YAML recipes and overriding standard PyTorch model and optimizer objects, encoding the logic of the sparsity algorithms from the recipe. Once you call manager.modify, you can then use the model and optimizer as usual, as SparseML abstracts away the complexity of the sparsification algorithms.

The workflow looks like this:

1model = Model() # model definition
2optimizer = Optimizer() # optimizer definition
3train_data = TrainData() # train data definition
4batch_size = BATCH_SIZE # training batch size
5steps_per_epoch = len(train_data) # batch_size
7from sparseml.pytorch.optim import ScheduledModifierManager
8manager = ScheduledModifierManager.from_yaml(PATH_TO_RECIPE)
9optimizer = manager.modify(model, optimizer, steps_per_epoch)
11# typical PyTorch training loop, using your model/optimizer as usual

SparseML CLI

In addition to the code-level API, SparseML offers pre-made training pipelines for common NLP and CV tasks via the CLI interface. The CLI enables you to kick-off training runs with various utilities like dataset loading and pre-processing, checkpoint saving, metric reporting, and logging handled for you. This makes it easy to get up and running in common training pathways.

For instance, we can use the following to kick off a YOLOv5 sparse transfer learning run onto the VOC dataset (using SparseZoo stubs to pull down a sparse model checkpoint and transfer learning recipe):

1sparseml.yolov5.train \
2 --weights zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none?recipe_type=transfer_learn \
3 --recipe zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none?recipe_type=transfer_learn \
4 --data VOC.yaml \
5 --hyp hyps/hyp.finetune.yaml --cfg yolov5s.yaml --patience 0

Additional Resources

More information on the code base and contained processes can be found in the SparseML documentation:


Learning More

Release History

Official builds are hosted on PyPI

Additionally, more information can be found via GitHub Releases.


The project is licensed under the Apache License Version 2.0.



We appreciate contributions to the code, examples, integrations, and documentation as well as bug reports and feature requests! Learn how here.


For user help or questions about SparseML, sign up or log in to our Neural Magic Community Slack. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our GitHub Issue Queue.

You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by subscribing to the Neural Magic community.

For more general questions about Neural Magic, please fill out this form.


Find this project useful in your research or other communications? Please consider citing:

2 pmlr-v119-kurtz20a,
3 title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks},
4 author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan},
5 booktitle = {Proceedings of the 37th International Conference on Machine Learning},
6 pages = {5533--5543},
7 year = {2020},
8 editor = {Hal Daumé III and Aarti Singh},
9 volume = {119},
10 series = {Proceedings of Machine Learning Research},
11 address = {Virtual},
12 month = {13--18 Jul},
13 publisher = {PMLR},
14 pdf = {},
15 url = {},
16 abstract = {Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.}
2 singh2020woodfisher,
3 title={WoodFisher: Efficient Second-Order Approximation for Neural Network Compression},
4 author={Sidak Pal Singh and Dan Alistarh},
5 year={2020},
6 eprint={2004.14340},
7 archivePrefix={arXiv},
8 primaryClass={cs.LG}
DeepSparse C++ API
SparseML CLI