Neural Magic enables you to deploy deep learning models on commodity CPUs with GPU-class performance.
CPU-based deep learning deployments on commodity hardware are flexible and scalable.
Because DeepSparse reaches GPU-class performance with commodity CPUs, users no longer need to tether deployments to accelerators to reach the performance needed for production. Free from specialized hardware, deployments can take advantage of the flexibility and scalability of software-defined inference:
Simply put, deep learning deployments no longer need to choose between the performance of GPUs and simplicty of software!
The Neural Magic Platform enables two major workflows.
SparseML and SparseZoo work together to optimize models for inference with techniques like pruning and quantization (which we call "sparsity").
SparseML is an open-source library that extends PyTorch and TensorFlow to simplify the process of applying sparsity algorithms. Via simple CLI scripts or five lines of code, users can sparsify any model from scratch or sparse transfer learn from pre-sparsified versions of foundation models like ResNet, YOLOv5, or BERT.
SparseZoo is an open-source repository of pre-sparsified models (for example, sparse ResNet-50 has 95% of weights set to 0 while maintaining 99% of the baseline accuracy). SparseZoo is integrated with SparseML, making it trival for users to fine-tune from sparse model (which we call "Sparse Transfer Learning") onto their data.
DeepSparse runs inference-optimized sparse models with GPU-class performance on CPUs.
The documentation is organized into several sections:
Not Sure Where to Start?
✅ Check out our GitHub repositories and give us a ⭐.
✅ Help us improve this documentation.