DeepSparse Engine supports fast inference on CPUs for sparse and dense models. For sparse models in particular, it achieves GPU-level performance in many use cases.
Around the engine, the DeepSparse package includes various utilities to simplify benchmarking performance and model deployment. For instance:
Pipelinesutilities wrap the model execution with input pre-processing and output post-processing, simplifying deployment and adding functionality like multi-stream, bucketing and dynamic shape.
The examples below walk through use cases leveraging DeepSparse for testing and benchmarking ONNX models for integrated use cases.
More documentation, models, use cases, and examples are continually being added. If you don't see one you're interested in, search the DeepSparse Github repo, the SparseML Github repo, the SparseZoo website, or ask in the Neural Magic Slack.