Neural Magic LogoNeural Magic Logo
DeepSparse EngineSparseMLSparseZoo
Get Started
Deploy a Model

Deploy a Model

The DeepSparse package comes pre-installed with a server to enable easy and performant model deployments. The server provides an HTTP interface to communicate and run inferences on the deployed model rather than the Python APIs or CLIs. It is a production-ready model serving solution built on Neural Magic's sparsification solutions resulting in faster and cheaper deployments.

The inference server is built with performance and flexibility in mind, with support for multiple models and multiple simultaneous streams. It is also designed to be a plug-and-play solution for many ML Ops deployment solutions, including Kubernetes and AWS SageMaker.

Example Use Cases

The docs below walk through use cases leveraging DeepSparse Server for deployment.

Other Use Cases

More documentation, models, use cases, and examples are continually being added. If you don't see one you're interested in, search the DeepSparse Github repo, the SparseML Github repo, the SparseZoo website, or ask in the Neural Magic Slack.

Creating a Custom Integration for Sparsifying Models
Deploy a Text Classification Model