Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
SparseML is a toolkit that includes APIs, CLIs, scripts and libraries that apply state-of-the-art sparsification algorithms such as pruning and quantization to any neural network. General, recipe-driven approaches built around these algorithms enable the simplification of creating faster and smaller models for the ML performance community at large.
Sparsification is the process of taking a trained deep learning model and removing redundant information from the overprecise and over-parameterized network resulting in a faster and smaller model. Techniques for sparsification are all encompassing including everything from inducing sparsity using pruning and quantization to enabling naturally occurring sparsity using activation sparsity or winograd/FFT. When implemented correctly, these techniques result in significantly more performant and smaller models with limited to no effect on the baseline metrics. For example, pruning plus quantization can give noticeable improvements in performance while recovering to nearly the same baseline accuracy.
The Deep Sparse product suite builds on top of sparsification enabling you to easily apply the techniques to your datasets and models using recipe-driven approaches. Recipes encode the directions for how to sparsify a model into a simple, easily editable format. - Download a sparsification recipe and sparsified model from the SparseZoo. - Alternatively, create a recipe for your model using Sparsify. - Apply your recipe with only a few lines of code using SparseML. - Finally, for GPU-level performance on CPUs, deploy your sparse-quantized model with the DeepSparse Engine.
Full Deep Sparse product flow:
Resources and Learning More
Additionally, more information can be found via GitHub Releases.
- Quick Tour
- Sparsification Recipes