This page explains how to run a model on DeepSparse for a custom task inside a Python API called Pipelines.
Pipelines
wrap key utilities around DeepSparse for easy testing and deployment.
DeepSparse supports many operators within ONNX, enabling performance for most models and use cases outside of the ones available on the SparseZoo.
The CustomTaskPipeline
enables you to wrap your model with custom pre-processing and post-processing functions for simple deployment and benchmarking.
In this way, DeepSparse combines the simplicity of Pipelines with GPU-class performance for any use case.
This example requires DeepSparse General Installation and SparseML Torchvision Installation.
For custom model deployment, export your model to the ONNX model format (create a model.onnx
file).
SparseML has available wrappers for ONNX export classes and APIs for a more straightforward export process.
A sample export utilizing this API for a MobileNetV2 TorchVision model is given below.
1import torch2from torchvision.models.mobilenetv2 import mobilenet_v23from sparseml.pytorch.utils import export_onnx45model = mobilenet_v2(pretrained=True)6sample_batch = torch.randn((1, 3, 224, 224))7export_path = "custom_model.onnx"8export_onnx(model, sample_batch, export_path)
Once the model is in an ONNX format, it is ready for inclusion in a CustomTaskPipeline
or benchmarking.
Examples for both are given below.
The model.onnx
file can be passed into a DeepSparse CustomTaskPipeline
utilizing the model_path
argument alongside optional pre-processing and post-processing functions.
A sample image is downloaded that will be run through the example to test the Pipeline
.
wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
Next, the pre-processing and post-processing functions are defined, and the pipeline enabling the classification of the image file is instantiated:
1from deepsparse.pipelines.custom_pipeline import CustomTaskPipeline2import torch3from torchvision import transforms4from PIL import Image56IMAGENET_RGB_MEANS = [0.485, 0.456, 0.406]7IMAGENET_RGB_STDS = [0.229, 0.224, 0.225]8preprocess_transforms = transforms.Compose([9 transforms.Resize(256),10 transforms.CenterCrop(224),11 transforms.ToTensor(),12 transforms.Normalize(mean=IMAGENET_RGB_MEANS, std=IMAGENET_RGB_STDS),13])1415def preprocess(inputs):16 with open(inputs, "rb") as img_file:17 img = Image.open(img_file)18 img = img.convert("RGB")19 img = preprocess_transforms(img)20 batch = torch.stack([img])21 return [batch.numpy()] # deepsparse requires a list of numpy array inputs2223def postprocess(outputs):24 return outputs # list of numpy array outputs2526custom_pipeline = CustomTaskPipeline(27 model_path="custom_model.onnx",28 process_inputs_fn=preprocess,29 process_outputs_fn=postprocess,30)31inference = custom_pipeline("basilica.jpg")32print(inference)>[array([[-5.64189434e+00, -2.78636312e+00, -2.62499309e+00, ...
The DeepSparse installation includes a benchmark CLI for convenient and easy inference performance benchmarking: deepsparse.benchmark
.
The CLI takes in both SparseZoo stubs or paths to a local model.onnx
file.
The code below provides an example for benchmarking the previously exported MobileNetV2 model. The output shows that the model achieved 441 items per second on a 4-core CPU.
$deepsparse.benchmark custom_model.onnx>DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.2 (7dc5fa34) (release) (optimized) (system=avx512, binary=avx512)>Original Model Path: custom_model.onnx>Batch Size: 1>Scenario: async>Throughput (items/sec): 441.2780>Latency Mean (ms/batch): 4.5244>Latency Median (ms/batch): 4.5054>Latency Std (ms/batch): 0.0774>Iterations: 4414