Neural Magic LogoNeural Magic Logo
DeepSparse EngineSparseMLSparseZoo
Get Started
Use a Model
CV Object Detection

Use an Object Detection Model

This page explains how to run a trained model on DeepSparse for Object Detection inside a Python API called Pipelines.

Pipelines wraps key utilities around DeepSparse for easy testing and deployment.

The object detection Pipeline, for example, wraps a trained model with the proper pre-processing and post-processing pipelines such as NMS. This enables the passing of raw images and receiving the bounding boxes from the DeepSparse Engine without any extra effort. With all of this built on top of the DeepSparse Engine, the simplicity of Pipelines is combined with GPU-class performance on CPUs for sparse models.

Installation Requirements

This example requires DeepSparse YOLO Installation.

Model Setup

The object detection Pipeline uses Ultralytics YOLOv5 standards and configurations for model setup. The possible files/variables that can be passed in are:

  • model.onnx - Exported YOLOv5 model in the ONNX format.
  • model.yaml - Ultralytics model configuration file containing configuration information about the model and its post-processing.
  • class_names - A list, dictionary, or file containing the index to class name mappings for the trained model.

model.onnx is the only required file. The pipeline will default to a standard setup for the COCO dataset if the model configuration file or class names are not provided.

There are two options for passing these files to DeepSparse:

1) Using the SparseZoo

This pathway is relevant if you want to use a pre-sparsified state-of-the-art model off the shelf.

SparseZoo is a repository of pre-trained and pre-sparsified models. DeepSparse supports SparseZoo stubs as inputs for automatic download and inclusion into easy testing and deployment. These models include dense and sparsified versions of YOLOv5 trained on the COCO dataset for performant and general detection, among others. The SparseZoo stubs can be found on SparseZoo model pages, and YOLOv5l examples are provided below:


These SparseZoo stubs can be passed as arguments to the Pipeline constructor in the examples below.

2) Using a custom local model

This pathway is relevant if you want to use a model fine-tuned on your data with SparseML or a custom model.

There are three steps to using a local model with Pipelines:

  1. Create the model.onnx file (if you trained with SparseML, use the ONNX export script).
  2. Collect the model.yaml file and class_names listed above.
  3. Pass the local paths of the files in place of the SparseZoo stubs.

The examples below use the SparseZoo stubs. Pass the path to the local model in place of the stubs if you want to use a custom model.

Inference Pipelines

With the object detection model set up, the model can be passed into a DeepSparse Pipeline utilizing the model_path argument. The SparseZoo stub for the sparse-quantized YOLOv5l model given at the beginning is used in the sample code below. It will automatically download the necessary files for the model from the SparseZoo and then compile them on your local machine with DeepSparse. Once compiled, the model Pipeline is ready for inference with images.

First, a sample image is downloaded that will be run through the example to test the pipeline.

wget -O basilica.jpg

Next, instantiate the Pipeline and pass in the image using the images argument:

1from deepsparse import Pipeline
3yolo_pipeline = Pipeline.create(
4 task="yolo",
5 model_path="zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95", # if using custom model, pass in local path to model.onnx
6 class_names=None, # if using custom model, pass in a list of classes the model will clasify or a path to a json file containing them
7 model_config=None, # if using custom model, pass in the path to a local model config file here
9inference = yolo_pipeline(images=['basilica.jpg'], iou_thres=0.6, conf_thres=0.001)
>predictions=[[[174.3507843017578, 478.4552917480469, 346.09051513671875, 618.4129638671875, ...


The DeepSparse installation includes a CLI for convenient performance benchmarking. You can pass a SparseZoo stub or a local model.onnx file.

Dense YOLOv5l

The code below provides an example for benchmarking a dense YOLOv5l model with DeepSparse. The output shows that the model achieved 5.3 items per second on a 4-core CPU.

$deepsparse.benchmark zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/base-none
>DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.0 (8eaddc24) (release) (optimized) (system=avx512, binary=avx512)
>Original Model Path: zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/base-none
>Batch Size: 1
>Scenario: async
>Throughput (items/sec): 5.2836
>Latency Mean (ms/batch): 378.2448
>Latency Median (ms/batch): 378.1490
>Latency Std (ms/batch): 2.5183
>Iterations: 54

Sparsified YOLOv5l

Running on the same server, the code below shows how the benchmarks change when utilizing a sparsified version of YOLOv5l. It achieved 19.0 items per second, a 3.6X increase in performance over the dense baseline.

$deepsparse.benchmark zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95
>DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.0 (8eaddc24) (release) (optimized) (system=avx512, binary=avx512)
>Original Model Path: zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95
>Batch Size: 1
>Scenario: async
>Throughput (items/sec): 18.9863
>Latency Mean (ms/batch): 105.2613
>Latency Median (ms/batch): 105.0656
>Latency Std (ms/batch): 1.6043
>Iterations: 190
Use a Text Classification Model
Use a Custom Use Case