This page explains how to deploy an Object Detection model with DeepSparse.
DeepSparse allows accelerated inference, serving, and benchmarking of sparsified Ultralytics YOLOv5 models. The Ultralytics integration enables you to easily deploy sparsified YOLOv5 onto the DeepSparse Engine for GPU-class performance directly on the CPU.
This integration currently supports the original YOLOv5 and updated V6.1 architectures.
This section requires the DeepSparse Server+YOLO Install.
Before you start using the DeepSparse Engine, confirm your machine is compatible with our hardware requirements.
To deploy an image classification model using DeepSparse Engine, pass the model in the ONNX format. This grants the engine the flexibility to serve any model in a framework-agnostic environment.
Below we describe two possibilities to obtain the required ONNX model.
This pathway is relevant if you intend to deploy a model created using the SparseML library.
After training your model with SparseML, locate the
.pt file for the model you'd like to export and run the ONNX export script below.
1sparseml.yolov5.export_onnx \2 --weights path/to/your/model \3 --dynamic #Allows for dynamic input shape
model.onnx file, in the local filesystem in the directory of your
The examples below use SparseZoo stubs, but simply pass the path to
model.onnx in place of the stubs to use the local model.
This pathway is relevant if you plan to use an off-the-shelf model from the SparseZoo.
When a SparseZoo stub is passed to the model, DeepSparse downloads the appropriate ONNX and other configuration files from the SparseZoo repo. For example, the SparseZoo stub for the pruned (not quantized) YOLOv5 is:
The Deployment APIs examples use SparseZoo stubs to highlight this pathway.
DeepSparse provides both a Python
Pipeline API and an out-of-the-box model server
that can be used for end-to-end inference in either Python workflows or as an HTTP endpoint.
Both options provide similar specifications for configurations and support annotation serving for all
Pipelines are the default interface for running inference with the DeepSparse Engine.
Once a model is obtained, either through SparseML training or directly from SparseZoo,
Pipeline can be used to easily facilitate end-to-end inference and deployment of the sparsified neural networks.
If no model is specified to the
Pipeline for a given task, the
Pipeline will automatically select a pruned and quantized model for the task from the SparseZoo that can be used for accelerated inference. Note that other models in the SparseZoo will have different tradeoffs between speed, size, and accuracy.
As an alternative to Python API, the DeepSparse Server allows you to serve ONNX models and pipelines in HTTP.
Configuring the server uses the same parameters and schemas as the
Pipelines enabling simple deployment. Once launched, a
/docs endpoint is created with full
endpoint descriptions and support for making sample requests.
An example of starting and requesting a DeepSparse Server for YOLOv5 is given below.
The following section includes example usage of the
Pipeline and server APIs for
various image classification models. Each example uses a SparseZoo stub to pull down the model,
but a local path to an ONNX file can also be passed as the
If you don't have an image ready, pull a sample image down with:
wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
Pipeline and run inference with the following.
1from deepsparse import Pipeline23model_stub = "zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned-aggressive_98"4images = ["basilica.jpg"]56yolo_pipeline = Pipeline.create(7 task="yolo",8 model_path=model_stub,9)1011pipeline_outputs = yolo_pipeline(images=images, iou_thres=0.6, conf_thres=0.001)
You can also use the annotate command to have the engine save an annotated photo on disk.
deepsparse.object_detection.annotate --source basilica.jpg #Try --source 0 to annotate your live webcam feed
Running the above command will create an
annotation-results folder and save the annotated image inside.
--model_filepath arg isn't provided, then
zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 will be used by default.
1deepsparse.server \2 task yolo \3 --model_path "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94"
Making a request:
1import requests2import json34url = 'http://0.0.0.0:5543/predict/from_files'5path = ['basilica.jpg'] # list of images for inference6files = [('request', open(img, 'rb')) for img in path]7resp = requests.post(url=url, files=files)8annotations = json.loads(resp.text) # dictionary of annotation results9bounding_boxes = annotations["boxes"]10labels = annotations["labels"]
The mission of Neural Magic is to enable GPU-class inference performance on commodity CPUs. Want to find out how fast our sparse YOLOv5 ONNX models perform inference? You can quickly do benchmarking tests on your own with a single CLI command!
You only need to provide the model path of a SparseZoo ONNX model or your own local ONNX model to get started:
1deepsparse.benchmark \2 zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 \3 --scenario sync>Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94>Batch Size: 1>Scenario: sync>Throughput (items/sec): 74.0355>Latency Mean (ms/batch): 13.4924>Latency Median (ms/batch): 13.4177>Latency Std (ms/batch): 0.2166>Iterations: 741
To learn more about benchmarking, refer to the appropriate documentation. Also, check out our Benchmarking Tutorial on GitHub!