Neural Magic LogoNeural Magic Logo
Use Cases
Image Classification

Deploying Image Classification Models With DeepSparse

This page explains how to deploy an image classification model with DeepSparse.

DeepSparse allows accelerated inference, serving, and benchmarking of sparsified image classification models. This integration enables you to easily deploy sparsified image classification models with DeepSparse for GPU-class performance directly on the CPU.

Installation Requirements

This use case requires the installation of DeepSparse Server.

Getting Started

Before you start using DeepSparse, confirm your machine is compatible with our hardware requirements.

Model Format

To deploy an image classification model with DeepSparse, pass the model in the ONNX format. This grants DeepSparse the flexibility to serve any model in a framework-agnostic environment.

There are two options for creating the model in ONNX format:

1) Export the ONNX/Config Files From SparseML

This pathway is relevant if you intend to deploy a model created using SparseML library.

After training your model with SparseML, locate the .pth file for the checkpoint you'd like to export and run the SparseML integrated export script below.

1sparseml.image_classification.export_onnx \
2 --arch-key resnet50 \
3 --dataset imagenet \
4 --dataset-path ~/datasets/ILSVRC2012 \
5 --checkpoint-path ~/checkpoints/resnet50_checkpoint.pth

This creates a model.onnx file.

The examples below use SparseZoo stubs, but simply pass the path to model.onnx in place of the stubs to use the local model.

2) Pass a SparseZoo Stub To DeepSparse

This pathway is relevant if you plan to use an off-the-shelf model from the SparseZoo.

All of DeepSparse's Pipelines and APIs can use a SparseZoo stub in place of a local folder. The Pipelines use the stubs to locate and download the ONNX and configuration files from the SparseZoo repository.

All of DeepSparse's pipelines and APIs can use a SparseZoo stub in place of a local folder. The examples use SparseZoo stubs to highlight this pathway.

The examples below use option 2. However, you can pass the local path to the ONNX file, as needed.

Deployment APIs

DeepSparse provides both a Python Pipeline API and an out-of-the-box model server that can be used for end-to-end inference in either Python workflows or as an HTTP endpoint. Both options provide similar specifications for configurations and support a variety of image classification models.

Python API

Pipelines are the default interface for running inference with DeepSparse.

Once a model is obtained, either through SparseML training or directly from SparseZoo, a Pipeline can be used to easily facilitate end-to-end inference and deployment of the sparsified image classification model.

If no model is specified to the Pipeline for a given task, the Pipeline will automatically select a pruned and quantized model for the task from the SparseZoo that can be used for accelerated inference. Note that other models in the SparseZoo will have different tradeoffs between speed, size, and accuracy.

HTTP Server

As an alternative to Python API, DeepSparse Server allows you to serve ONNX models and pipelines in HTTP. Configuring the server uses the same parameters and schemas as the Pipelines, enabling simple deployment. Once launched, a /docs endpoint is created with full endpoint descriptions and support for making sample requests.

An example deployment using a 95% pruned ResNet-50 is given below.

Refer also to the full documentation for DeepSparse Server.

Deployment Examples

The following section includes example usage of the Pipeline and server APIs for various image classification models. Each example uses a SparseZoo stub to pull down the model, but a local path to an ONNX file can also be passed as the model_path.

Python API

Create a Pipeline to run inference with the following code. The Pipeline handles the pre-processing (e.g., subtracting by ImageNet means, dividing by ImageNet standard deviation) and post-processing so you can pass a raw image and receive a class without any extra code.

1from deepsparse import Pipeline
2cv_pipeline = Pipeline.create(
3 task='image_classification',
4 model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none', # Path to checkpoint or SparseZoo stub
5 class_names=None # optional dict / json mapping class ids to labels (if not using ImageNet classes)
7input_image = "my_image.png" # path to input image
8inference = cv_pipeline(images=input_image)

HTTP Server

Spinning up:

1deepsparse.server \
2 task image_classification \
3 --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none" \
4 --port 5543

Making a request:

1import requests
3url = ''
4path = ['goldfish.jpeg'] # just put the name of images in here
5files = [('request', open(img, 'rb')) for img in path]
6resp =, files=files)


The mission of Neural Magic is to enable GPU-class inference performance on commodity CPUs. Want to find out how fast our sparse ONNX models perform inference? You can quickly run benchmarking tests on your own with a single CLI command.

You only need to provide the model path of a SparseZoo ONNX model or your own local ONNX model to get started:

deepsparse.benchmark zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none

The output is:

1Original Model Path: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none
2Batch Size: 1
3Scenario: async
4Throughput (items/sec): 299.2372
5Latency Mean (ms/batch): 16.6677
6Latency Median (ms/batch): 16.6748
7Latency Std (ms/batch): 0.1728
8Iterations: 2995

To learn more about benchmarking, refer to the appropriate documentation. Also, check out our Benchmarking Tutorial on GitHub.

Sparsifying Image Classification Models With SparseML
Sparsifying Object Detection With Ultralytics YOLOv5 and SparseML