Neural Magic LogoNeural Magic Logo
DeepSparse EngineSparseMLSparseZoo
Use Cases
Image Classification

Deploying Image Classification Models with DeepSparse

This page explains how to deploy an Image Classification model with DeepSparse.

DeepSparse allows accelerated inference, serving, and benchmarking of sparsified image classification models. These integrations enables you to easily deploy sparsified image classification models onto the DeepSparse Engine for GPU-class performance directly on the CPU.

Installation Requirements

This section requires the DeepSparse Server Install.

Getting Started

Before you start using the DeepSparse Engine, confirm your machine is compatible with our hardware requirements.

Model Format

To deploy an image classification model using DeepSparse Engine, pass the model in the ONNX format. This grants the engine the flexibility to serve any model in a framework-agnostic environment.

There are two options to creating the model in ONNX format:

1) Export the ONNX/Config Files From SparseML

This pathway is relevant if you intend to deploy a model created using SparseML library.

After training your model with SparseML, locate the .pth file for the checkpoint you'd like to export and run the SparseML integrated export script below.

1sparseml.image_classification.export_onnx \
2 --arch-key resnet50 \
3 --dataset imagenet \
4 --dataset-path ~/datasets/ILSVRC2012 \
5 --checkpoint-path ~/checkpoints/resnet50_checkpoint.pth

This creates model.onnx file.

The examples below use SparseZoo stubs, but simply pass the path to model.onnx in place of the stubs to use the local model.

2) Pass a SparseZoo Stub To DeepSparse

This pathway is relevant if you plan to use an off-the-shelf model from the SparseZoo.

All of DeepSparse's Pipelines and APIs can use a SparseZoo stub in place of a local folder. The Pipelines use the stubs to locate and download the ONNX and config files from the SparseZoo repo.

All of DeepSparse's pipelines and APIs can use a SparseZoo stub in place of a local folder. The examples use SparseZoo stubs to highlight this pathway.

The examples below use option 2. However, you can pass the local path to the ONNX file as needed.

Deployment APIs

DeepSparse provides both a Python Pipeline API and an out-of-the-box model server that can be used for end-to-end inference in either Python workflows or as an HTTP endpoint. Both options provide similar specifications for configurations and support a variety of image classification models.

Python API

Pipelines are the default interface for running inference with the DeepSparse Engine.

Once a model is obtained, either through SparseML training or directly from SparseZoo, a Pipeline can be used to easily facilitate end to end inference and deployment of the sparsified image classification model.

If no model is specified to the Pipeline for a given task, the Pipeline will automatically select a pruned and quantized model for the task from the SparseZoo that can be used for accelerated inference. Note that other models in the SparseZoo will have different tradeoffs between speed, size, and accuracy.

HTTP Server

As an alternative to Python API, the DeepSparse Server allows you to serve ONNX models and pipelines in HTTP. Configuring the server uses the same parameters and schemas as the Pipelines, enabling simple deployment. Once launched, a /docs endpoint is created with full endpoint descriptions and support for making sample requests.

An example deployment using a 95% pruned ResNet-50 is given below.

For full documentation on deploying sparse image classification models with the DeepSparse Server, see the documentation for DeepSparse Server.

Deployment Examples

The following section includes example usage of the Pipeline and server APIs for various image classification models. Each example uses a SparseZoo stub to pull down the model, but a local path to an ONNX file can also be passed as the model_path.

Python API

Create a Pipeline to run inference with the following code. The Pipeline handles the pre-processing (e.g., subtracting by ImageNet means, dividing by ImageNet standard deviation) and post-processing so you can pass an raw image and receive an class without any extra code.

1from deepsparse import Pipeline
2cv_pipeline = Pipeline.create(
3 task='image_classification',
4 model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none', # Path to checkpoint or SparseZoo stub
5 class_names=None # optional dict / json mapping class ids to labels (if not using ImageNet classes)
7input_image = "my_image.png" # path to input image
8inference = cv_pipeline(images=input_image)

HTTP Server

Spinning up:

1deepsparse.server \
2 task image_classification \
3 --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none" \
4 --port 5543

Making a request:

1import requests
3url = ''
4path = ['goldfish.jpeg'] # just put the name of images in here
5files = [('request', open(img, 'rb')) for img in path]
6resp =, files=files)


The mission of Neural Magic is to enable GPU-class inference performance on commodity CPUs. Want to find out how fast our sparse ONNX models perform inference? You can quickly run benchmarking tests on your own with a single CLI command.

You only need to provide the model path of a SparseZoo ONNX model or your own local ONNX model to get started:

deepsparse.benchmark zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none


1Original Model Path: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none
2Batch Size: 1
3Scenario: async
4Throughput (items/sec): 299.2372
5Latency Mean (ms/batch): 16.6677
6Latency Median (ms/batch): 16.6748
7Latency Std (ms/batch): 0.1728
8Iterations: 2995

To learn more about benchmarking, refer to the appropriate documentation. Also, check out our Benchmarking Tutorial on GitHub !

Sparsifying Image Classification Models with SparseML
Sparsifying Object Detection with Ultralytics YOLOv5 and SparseML