This page explains how to deploy an image classification model with DeepSparse.
DeepSparse allows accelerated inference, serving, and benchmarking of sparsified image classification models. This integration enables you to easily deploy sparsified image classification models with DeepSparse for GPU-class performance directly on the CPU.
This use case requires the installation of DeepSparse Server.
Before you start using DeepSparse, confirm your machine is compatible with our hardware requirements.
To deploy an image classification model with DeepSparse, pass the model in the ONNX format. This grants DeepSparse the flexibility to serve any model in a framework-agnostic environment.
There are two options for creating the model in ONNX format:
This pathway is relevant if you intend to deploy a model created using SparseML library.
After training your model with SparseML, locate the .pth
file for the checkpoint you'd like to export and run the SparseML
integrated export script below.
1sparseml.image_classification.export_onnx \2 --arch-key resnet50 \3 --dataset imagenet \4 --dataset-path ~/datasets/ILSVRC2012 \5 --checkpoint-path ~/checkpoints/resnet50_checkpoint.pth
This creates a model.onnx
file.
The examples below use SparseZoo stubs, but simply pass the path to model.onnx
in place of the stubs to use the local model.
This pathway is relevant if you plan to use an off-the-shelf model from the SparseZoo.
All of DeepSparse's Pipelines
and APIs can use a SparseZoo stub in place of a local folder.
The Pipelines
use the stubs to locate and download the ONNX and configuration files from the SparseZoo repository.
All of DeepSparse's pipelines and APIs can use a SparseZoo stub in place of a local folder. The examples use SparseZoo stubs to highlight this pathway.
The examples below use option 2. However, you can pass the local path to the ONNX file, as needed.
DeepSparse provides both a Python Pipeline
API and an out-of-the-box model
server that can be used for end-to-end inference in either Python
workflows or as an HTTP endpoint. Both options provide similar specifications
for configurations and support a variety of image classification models.
Pipelines
are the default interface for running inference with DeepSparse.
Once a model is obtained, either through SparseML training or directly from SparseZoo,
a Pipeline
can be used to easily facilitate end-to-end inference and deployment
of the sparsified image classification model.
If no model is specified to the Pipeline
for a given task, the Pipeline
will automatically
select a pruned and quantized model for the task from the SparseZoo that can be used for accelerated
inference. Note that other models in the SparseZoo will have different tradeoffs between speed, size,
and accuracy.
As an alternative to Python API, DeepSparse Server allows you to
serve ONNX models and pipelines in HTTP. Configuring the server uses the same parameters and schemas as the Pipelines
,
enabling simple deployment. Once launched, a /docs
endpoint is created with full
endpoint descriptions and support for making sample requests.
An example deployment using a 95% pruned ResNet-50 is given below.
Refer also to the full documentation for DeepSparse Server.
The following section includes example usage of the Pipeline
and server APIs for
various image classification models. Each example uses a SparseZoo stub to pull down the model,
but a local path to an ONNX file can also be passed as the model_path
.
Create a Pipeline
to run inference with the following code. The Pipeline
handles the pre-processing (e.g., subtracting by ImageNet
means, dividing by ImageNet standard deviation) and post-processing so you can pass a raw image and receive a class without any extra code.
1from deepsparse import Pipeline2cv_pipeline = Pipeline.create(3 task='image_classification',4 model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none', # Path to checkpoint or SparseZoo stub5 class_names=None # optional dict / json mapping class ids to labels (if not using ImageNet classes)6)7input_image = "my_image.png" # path to input image8inference = cv_pipeline(images=input_image)
Spinning up:
1deepsparse.server \2 task image_classification \3 --model_path "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none" \4 --port 5543
Making a request:
1import requests23url = 'http://0.0.0.0:5543/predict/from_files'4path = ['goldfish.jpeg'] # just put the name of images in here5files = [('request', open(img, 'rb')) for img in path]6resp = requests.post(url=url, files=files)
The mission of Neural Magic is to enable GPU-class inference performance on commodity CPUs. Want to find out how fast our sparse ONNX models perform inference? You can quickly run benchmarking tests on your own with a single CLI command.
You only need to provide the model path of a SparseZoo ONNX model or your own local ONNX model to get started:
deepsparse.benchmark zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none
The output is:
1Original Model Path: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none2Batch Size: 13Scenario: async4Throughput (items/sec): 299.23725Latency Mean (ms/batch): 16.66776Latency Median (ms/batch): 16.67487Latency Std (ms/batch): 0.17288Iterations: 2995
To learn more about benchmarking, refer to the appropriate documentation. Also, check out our Benchmarking Tutorial on GitHub.