This page explains how to run a model on the DeepSparse Engine for a custom task inside a Python API called
Pipelines wrap key utilities around the DeepSparse Engine for easy testing and deployment.
The DeepSparse Engine supports many operators within ONNX, enabling performance for most models and use cases outside of the ones available on the SparseZoo.
CustomTaskPipeline enables you to wrap your model with custom pre and post-processing functions for simple deployment and benchmarking.
In this way, the simplicity of
Pipelines is combined with the performance of DeepSparse for arbitrary use cases.
For custom model deployment, first export your model to the ONNX model format (create a
SparseML has available wrappers for ONNX export classes and APIs for a more straightforward export process.
A sample export utilizing this API for a MobileNetV2 TorchVision model is given below.
1import torch2from torchvision.models.mobilenetv2 import mobilenet_v23from sparseml.pytorch.utils import export_onnx45model = mobilenet_v2(pretrained=True)6sample_batch = torch.randn((1, 3, 224, 224))7export_path = "custom_model.onnx"8export_onnx(model, sample_batch, export_path)
Once the model is in an ONNX format, it is ready for inclusion in a
CustomTaskPipeline or benchmarking.
Examples for both are given below.
model.onnx file can be passed into a DeepSparse
CustomTaskPipeline utilizing the
model_path argument alongside optional pre and post-processing functions.
A sample image is downloaded that will be run through the example to test the
wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
Next, the pre and post-processing functions are defined, and the pipeline enabling the classification of the image file is instantiated:
1from deepsparse.pipelines.custom_pipeline import CustomTaskPipeline2import torch3from torchvision import transforms4from PIL import Image56IMAGENET_RGB_MEANS = [0.485, 0.456, 0.406]7IMAGENET_RGB_STDS = [0.229, 0.224, 0.225]8preprocess_transforms = transforms.Compose([9 transforms.Resize(256),10 transforms.CenterCrop(224),11 transforms.ToTensor(),12 transforms.Normalize(mean=IMAGENET_RGB_MEANS, std=IMAGENET_RGB_STDS),13])1415def preprocess(inputs):16 with open(inputs, "rb") as img_file:17 img = Image.open(img_file)18 img = img.convert("RGB")19 img = preprocess_transforms(img)20 batch = torch.stack([img])21 return [batch.numpy()] # deepsparse requires a list of numpy array inputs2223def postprocess(outputs):24 return outputs # list of numpy array outputs2526custom_pipeline = CustomTaskPipeline(27 model_path="custom_model.onnx",28 process_inputs_fn=preprocess,29 process_outputs_fn=postprocess,30)31inference = custom_pipeline("basilica.jpg")32print(inference)>[array([[-5.64189434e+00, -2.78636312e+00, -2.62499309e+00, ...
The DeepSparse install includes a benchmark CLI for convenient and easy inference performance benchmarking:
The CLI takes in both SparseZoo stubs or paths to a local
The code below provides an example for benchmarking the previously exported MobileNetV2 model. The output shows that the model achieved 441 items per second on a 4-core CPU.
$deepsparse.benchmark custom_model.onnx>DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.0.2 (7dc5fa34) (release) (optimized) (system=avx512, binary=avx512)>Original Model Path: custom_model.onnx>Batch Size: 1>Scenario: async>Throughput (items/sec): 441.2780>Latency Mean (ms/batch): 4.5244>Latency Median (ms/batch): 4.5054>Latency Std (ms/batch): 0.0774>Iterations: 4414