SparseML enables you to create a sparse model from scratch. The library contains state-of-the-art sparsification algorithms, including pruning, distillation, and quantization techniques.
These algorithms are built on top of sparsification recipes, enabling easy integration into custom ML training pipelines to sparsify most neural networks. Additionally, SparseML integrates with popular ML repositories like Hugging Face Transformers and Ultralytics YOLO. With these integrations, creating a recipe and passing it to a CLI is all you need to sparsify a model.
Aside from sparsification algorithms, SparseML contains generic export pathways for performant deployments. These export pathways ensure the model saves in the correct format and rewrites the inference graphs for performance, such as quantized operator folding. The results are simple to export CLIs and APIs that guarantee performance for sparsified models in their given deployment environment.
The examples below walk through use cases leveraging SparseML to sparsify models with recipes and exporting for performant inference.
More documentation, models, use cases, and examples are continually being added. If you don't see one you're interested in, search the DeepSparse GitHub repo, the SparseML GitHub repo, the SparseZoo website, or ask in the Neural Magic Slack.