Neural Magic LogoNeural Magic Logo
DeepSparse EngineSparseMLSparseZoo


Libraries enabling creation of sparse deep-neural networks trained on your data with just a few lines of code

Documentation Main GitHub release GitHub Contributor Covenant


SparseML is a toolkit that includes APIs, CLIs, scripts and libraries that enable you to create sparse models trained on your data.

SparseML provides two options to accomplish this goal:

  • Sparse Transfer Learning: Fine-tune state-of-the-art pre-sparsified models from the SparseZoo onto your dataset while preserving sparsity.

  • Sparsifying from Scratch: Apply state-of-the-art sparsification algorithms such as pruning and quantization to any neural network.

These options are useful for different situations:

  • Sparse Transfer Learning is the easiest path to creating a sparse model trained on your data. Pull down a sparse model from SparseZoo and point our training scripts at your data without any hyperparameter search. This is the recommended pathway for supported use cases like Image Classification, Object Detection, and several NLP tasks.

  • Sparsifying from Scratch gives you the flexibility to prune any neural network for any use case, but requires more training epochs and hand-tuning hyperparameters.

Each of these avenues use YAML-based recipes that simplify integration with popular deep learning libraries and framrworks.

SparseML Flow ## Highlights


Integration - PyTorch: MobileNetV1, ResNet-50 Integration - Ultralytics: YOLOv3 Integration - Ultralytics: YOLOv5 Integration - Hugging Face: BERT Integration - rwightman: ResNet-50

Creating Sparse Models

Creating Sparse ResNet-50 Creating Sparse YOLOv3 Creating Sparse YOLOv5 Creating Sparse BERT

Transfer Learning from Sparse Models

Transfer Learn - ResNet-50 Transfer Learn - YOLOv3 Transfer Learn - YOLOv5


🖼️ Computer Vision



Installation Requirements

See the SparseML Installation Page for install instructions.

Quick Tour

SparseML enables you to create a sparse model with Sparse Transfer Learning and Sparsification from Scratch.

To enable flexibility, ease of use, and repeatability, each piece of functionality is accomplished via recipes. The recipes encode the instructions needed for modifying the model and/or training process as a list of modifiers. Example modifiers can be anything from setting the learning rate for the optimizer to gradual magnitude pruning. The files are written in YAML and stored in YAML or markdown files using YAML front matter. The rest of the SparseML system is coded to parse the recipes into a native format for the desired framework and apply the modifications to the model and training pipeline.

To give a sense of the flavor of what recipes encode, some examples are below:

  • Recipes for Sparse Transfer Learning usually include the !ConstantPruningModifier, which instructs SparseML to maintian the starting level of sparsity while fine-tuning.

  • Recipes for Sparsification from Scratch usually include the !GMPruningModifier, which instructs SparseML to iteratively prune the layers of the model to certain levels (e.g. 80%) over which epochs.

Recipes are then integrated into deep learning training workflows in one of two ways:

For Supported Use Cases: CLI

SparseML provides command line scripts that accept recipes as arguments and perform Sparse Transfer Learning and Sparsification from Scratch. We highly reccomended using the command line scripts. Appending --help to the commands demonstrates the full list of arguments.

For example, the following command kicks off Sparse Transfer Learning from pre-sparsified YOLOv5 onto the VOC dataset using the pre-made recipes in the SparseZoo:

$sparseml.yolov5.train \
--data VOC.yaml \
--cfg models_v5.0/yolov5l.yaml \
--weights zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95?recipe_type=transfer \
--hyp data/hyps/hyp.finetune.yaml \
--recipe zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95?recipe_type=transfer

For example, the following command kicks off Sparsification of a dense YOLOv5 model from Scratch using the pre-made recipes in the SparseZoo:

$sparseml.yolov5.train \
--cfg models_v5.0/yolov5l.yaml \
--weights \
--data coco.yaml \
--hyp data/hyps/hyp.scratch.yaml \
--recipe zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95

See more details on the above as well as examples for more supported use cases.

For Custom Use Cases / Supported Use Cases: Python Integration

The ScheduledModifierManager class is used to modify the standard training workflows for both Sparse Transfer Learning and Sparsification from Scratch. It can be used in PyTorch and TensorFlow/Keras.

The manager classes works by overriding the model and optimizers to encode sparsity logic. Managers can apply recipes in one shot or training aware ways. One shot is invoked by calling .apply(...) on the manager while training aware requires calls into initialize(...) (optional), modify(...), and finalize(...). This means only a few lines of code need to be added to begin transfer learning or sparsifying from scratch with pruning and quantization.

For example, the following applies a recipe in a training-aware manner:

1model = Model() # model definition
2optimizer = Optimizer() # optimizer definition
3train_data = TrainData() # train data definition
4batch_size = BATCH_SIZE # training batch size
5steps_per_epoch = len(train_data) // batch_size
7from sparseml.pytorch.optim import ScheduledModifierManager
8manager = ScheduledModifierManager.from_yaml(PATH_TO_RECIPE)
9optimizer = manager.modify(model, optimizer, steps_per_epoch)
11# ... PyTorch training loop as usual ...

Instead of training aware, the following example code shows how to execute a recipe in a one shot manner:

1model = Model() # model definition
3from sparseml.pytorch.optim import ScheduledModifierManager
4manager = ScheduledModifierManager.from_yaml(PATH_TO_RECIPE)

More information on the codebase and contained processes can be found in the SparseML docs:


Learning More

Release History

Official builds are hosted on PyPI

Additionally, more information can be found via GitHub Releases.


The project is licensed under the Apache License Version 2.0.



We appreciate contributions to the code, examples, integrations, and documentation as well as bug reports and feature requests! Learn how here.


For user help or questions about SparseML, sign up or log in to our Deep Sparse Community Slack. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our GitHub Issue Queue.

You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by subscribing to the Neural Magic community.

For more general questions about Neural Magic, please fill out this form.


Find this project useful in your research or other communications? Please consider citing:

2 pmlr-v119-kurtz20a,
3 title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks},
4 author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan},
5 booktitle = {Proceedings of the 37th International Conference on Machine Learning},
6 pages = {5533--5543},
7 year = {2020},
8 editor = {Hal Daumé III and Aarti Singh},
9 volume = {119},
10 series = {Proceedings of Machine Learning Research},
11 address = {Virtual},
12 month = {13--18 Jul},
13 publisher = {PMLR},
14 pdf = {},
15 url = {},
16 abstract = {Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.}
2 singh2020woodfisher,
3 title={WoodFisher: Efficient Second-Order Approximation for Neural Network Compression},
4 author={Sidak Pal Singh and Dan Alistarh},
5 year={2020},
6 eprint={2004.14340},
7 archivePrefix={arXiv},
8 primaryClass={cs.LG}
DeepSparse C++ API
SparseML CLI