This page explains how to deploy an Image Classification model with DeepSparse.
DeepSparse allows accelerated inference, serving, and benchmarking of sparsified image classification models. These integrations enables you to easily deploy sparsified image classification models onto the DeepSparse Engine for GPU-class performance directly on the CPU.
This section requires the DeepSparse Server Install.
Before you start using the DeepSparse Engine, confirm your machine is compatible with our hardware requirements.
To deploy an image classification model using DeepSparse Engine, pass the model in the ONNX format. This grants the engine the flexibility to serve any model in a framework-agnostic environment.
There are two options to creating the model in ONNX format:
This pathway is relevant if you intend to deploy a model created using SparseML library.
After training your model with SparseML, locate the
.pth file for the checkpoint you'd like to export and run the
SparseML integrated export script below.
1sparseml.image_classification.export_onnx \2 --arch-key resnet50 \3 --dataset imagenet \4 --dataset-path ~/datasets/ILSVRC2012 \5 --checkpoint-path ~/checkpoints/resnet50_checkpoint.pth
The examples below use SparseZoo stubs, but simply pass the path to
model.onnx in place of the stubs to use the local model.
This pathway is relevant if you plan to use an off-the-shelf model from the SparseZoo.
All of DeepSparse's
Pipelines and APIs can use a SparseZoo stub in place of a local folder.
Pipelines use the stubs to locate and download the ONNX and config files from the SparseZoo repo.
All of DeepSparse's pipelines and APIs can use a SparseZoo stub in place of a local folder. The examples use SparseZoo stubs to highlight this pathway.
The examples below use option 2. However, you can pass the local path to the ONNX file as needed.
DeepSparse provides both a Python
Pipeline API and an out-of-the-box model
server that can be used for end-to-end inference in either Python
workflows or as an HTTP endpoint. Both options provide similar specifications
for configurations and support a variety of image classification models.
Pipelines are the default interface for running inference with the
Once a model is obtained, either through SparseML training or directly from SparseZoo,
Pipeline can be used to easily facilitate end to end inference and deployment
of the sparsified image classification model.
If no model is specified to the
Pipeline for a given task, the
Pipeline will automatically
select a pruned and quantized model for the task from the SparseZoo that can be used for accelerated
inference. Note that other models in the SparseZoo will have different tradeoffs between speed, size,
As an alternative to Python API, the DeepSparse Server allows you to
serve ONNX models and pipelines in HTTP. Configuring the server uses the same parameters and schemas as the
enabling simple deployment. Once launched, a
/docs endpoint is created with full
endpoint descriptions and support for making sample requests.
An example deployment using a 95% pruned ResNet-50 is given below.
For full documentation on deploying sparse image classification models with the DeepSparse Server, see the documentation for DeepSparse Server.
The following section includes example usage of the
Pipeline and server APIs for
various image classification models. Each example uses a SparseZoo stub to pull down the model,
but a local path to an ONNX file can also be passed as the
Pipeline to run inference with the following code. The
Pipeline handles the pre-processing (e.g., subtracting by ImageNet
means, dividing by ImageNet standard deviation) and post-processing so you can pass an raw image and receive an class without any extra code.
1from deepsparse import Pipeline2cv_pipeline = Pipeline.create(3 task='image_classification',4 model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none', # Path to checkpoint or SparseZoo stub5 class_names=None # optional dict / json mapping class ids to labels (if not using ImageNet classes)6)7input_image = "my_image.png" # path to input image8inference = cv_pipeline(images=input_image)
1deepsparse.server \2 task image_classification \3 --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none" \4 --port 5543
Making a request:
1import requests23url = 'http://0.0.0.0:5543/predict/from_files'4path = ['goldfish.jpeg'] # just put the name of images in here5files = [('request', open(img, 'rb')) for img in path]6resp = requests.post(url=url, files=files)
The mission of Neural Magic is to enable GPU-class inference performance on commodity CPUs. Want to find out how fast our sparse ONNX models perform inference? You can quickly run benchmarking tests on your own with a single CLI command.
You only need to provide the model path of a SparseZoo ONNX model or your own local ONNX model to get started:
1Original Model Path: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none2Batch Size: 13Scenario: async4Throughput (items/sec): 299.23725Latency Mean (ms/batch): 16.66776Latency Median (ms/batch): 16.67487Latency Std (ms/batch): 0.17288Iterations: 2995
To learn more about benchmarking, refer to the appropriate documentation. Also, check out our Benchmarking Tutorial on GitHub !