SparseZoo 0.3

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Overview

SparseZoo is a constantly-growing repository of highly sparse and sparse-quantized models with matching sparsification recipes for neural networks. It simplifies and accelerates your time-to-value in building performant deep learning models with a collection of inference-optimized models and recipes to prototype from.

Available via API and hosted in the cloud, the SparseZoo contains both baseline and models optimized to different degrees of inference performance vs. baseline loss recovery. Recipe-driven approaches built around sparsification algorithms allow you to take the models as given, transfer-learn from the models onto private datasets, or transfer the recipes to your architectures.

This repository contains the Python API code to handle the connection and authentication to the cloud.

Sparsification

Sparsification is the process of taking a trained deep learning model and removing redundant information from the overprecise and over-parameterized network resulting in a faster and smaller model. Techniques for sparsification are all encompassing including everything from inducing sparsity using pruning and quantization to enabling naturally occurring sparsity using activation sparsity or winograd/FFT. When implemented correctly, these techniques result in significantly more performant and smaller models with limited to no effect on the baseline metrics. For example, pruning plus quantization can give noticeable improvements in performance while recovering to nearly the same baseline accuracy.

The Deep Sparse product suite builds on top of sparsification enabling you to easily apply the techniques to your datasets and models using recipe-driven approaches. Recipes encode the directions for how to sparsify a model into a simple, easily editable format.

  • Download a sparsification recipe and sparsified model from the SparseZoo.

  • Alternatively, create a recipe for your model using Sparsify.

  • Apply your recipe with only a few lines of code using SparseML.

  • Finally, for GPU-level performance on CPUs, deploy your sparse-quantized model with the DeepSparse Engine.

Full Deep Sparse product flow: