🚨 Note: The current Docs site is outdated. Neural Magic's 1.7 release slated for January 2024 will include a Docs refresh. Meanwhile, please consult our GitHub repositories for the content:   DeepSparse,   SparseML,   SparseZoo.
Neural Magic LogoNeural Magic Logo
Get Started
Use a Model
Custom Use Case

Use a Custom Use Case

This page explains how to run a model on DeepSparse for a custom task inside a Python API called Pipelines.

Pipelines wrap key utilities around DeepSparse for easy testing and deployment.

DeepSparse supports many operators within ONNX, enabling performance for most models and use cases outside of the ones available on the SparseZoo. The CustomTaskPipeline enables you to wrap your model with custom pre-processing and post-processing functions for simple deployment and benchmarking. In this way, DeepSparse combines the simplicity of Pipelines with GPU-class performance for any use case.

Installation Requirements

This example requires DeepSparse General Installation and SparseML Torchvision Installation.

Model Setup

For custom model deployment, export your model to the ONNX model format (create a model.onnx file). SparseML has available wrappers for ONNX export classes and APIs for a more straightforward export process. A sample export utilizing this API for a MobileNetV2 TorchVision model is given below.

1import torch
2from torchvision.models.mobilenetv2 import mobilenet_v2
3from sparseml.pytorch.utils import export_onnx
5model = mobilenet_v2(pretrained=True)
6sample_batch = torch.randn((1, 3, 224, 224))
7export_path = "custom_model.onnx"
8export_onnx(model, sample_batch, export_path)

Once the model is in an ONNX format, it is ready for inclusion in a CustomTaskPipeline or benchmarking. Examples for both are given below.

Inference Pipelines

The model.onnx file can be passed into a DeepSparse CustomTaskPipeline utilizing the model_path argument alongside optional pre-processing and post-processing functions.

A sample image is downloaded that will be run through the example to test the Pipeline.

wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg

Next, the pre-processing and post-processing functions are defined, and the pipeline enabling the classification of the image file is instantiated:

1from deepsparse.pipelines.custom_pipeline import CustomTaskPipeline
2import torch
3from torchvision import transforms
4from PIL import Image
6IMAGENET_RGB_MEANS = [0.485, 0.456, 0.406]
7IMAGENET_RGB_STDS = [0.229, 0.224, 0.225]
8preprocess_transforms = transforms.Compose([
9 transforms.Resize(256),
10 transforms.CenterCrop(224),
11 transforms.ToTensor(),
12 transforms.Normalize(mean=IMAGENET_RGB_MEANS, std=IMAGENET_RGB_STDS),
15def preprocess(inputs):
16 with open(inputs, "rb") as img_file:
17 img = Image.open(img_file)
18 img = img.convert("RGB")
19 img = preprocess_transforms(img)
20 batch = torch.stack([img])
21 return [batch.numpy()] # deepsparse requires a list of numpy array inputs
23def postprocess(outputs):
24 return outputs # list of numpy array outputs
26custom_pipeline = CustomTaskPipeline(
27 model_path="custom_model.onnx",
28 process_inputs_fn=preprocess,
29 process_outputs_fn=postprocess,
31inference = custom_pipeline("basilica.jpg")
>[array([[-5.64189434e+00, -2.78636312e+00, -2.62499309e+00, ...


The DeepSparse installation includes a benchmark CLI for convenient and easy inference performance benchmarking: deepsparse.benchmark. The CLI takes in both SparseZoo stubs or paths to a local model.onnx file.

The code below provides an example for benchmarking the previously exported MobileNetV2 model. The output shows that the model achieved 441 items per second on a 4-core CPU.

$deepsparse.benchmark custom_model.onnx
>DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.2 (7dc5fa34) (release) (optimized) (system=avx512, binary=avx512)
>Original Model Path: custom_model.onnx
>Batch Size: 1
>Scenario: async
>Throughput (items/sec): 441.2780
>Latency Mean (ms/batch): 4.5244
>Latency Median (ms/batch): 4.5054
>Latency Std (ms/batch): 0.0774
>Iterations: 4414
Use an Object Detection Model
Transfer a Sparsified Model