This page walks through an example of deploying an object detection model with DeepSparse Server.
DeepSparse Server is a server wrapper around Pipelines
, including the object detection pipeline. As such,
the server provides and HTTP interface that accepts images and image files as inputs and outputs the labeled predictions.
In this way, DeepSparse combines the simplicity of servable pipelines with GPU-class performance on CPUs for sparse models.
This example requires DeepSparse Server+YOLO Install.
Before starting the server, the model must be set up in the format expected for DeepSparse Pipelines
.
See an example of how to setup Pipelines
in the Use a Model section.
Once the Pipelines
are set up, the deepsparse.server
command launches a server with the model at --model_path
inside. The model_path
can either
be a SparseZoo stub or a path to a local model.onnx
file.
The command below shows how to start up DeepSparse Server for a sparsified YOLOv5l model trained on the COCO dataset from the SparseZoo.
The output confirms the server was started on port :5543
with a /docs
route for general info and a /predict/from_files
route for inference.
$deepsparse.server \--task "yolo" \--model_path "zoo:cv/detection/yolov5-l/pytorch/ultralytics/coco/pruned_quant-aggressive_95">deepsparse.server.main INFO created FastAPI app for inference serving>deepsparse.server.main INFO created general routes, visit `/docs` to view available>DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.1.0 COMMUNITY EDITION (a436ca67) (release) (optimized) (system=avx512_vnni, binary=avx512)>deepsparse.server.main INFO created route /predict>deepsparse.server.main INFO created route /predict/from_files>INFO:uvicorn.error:Started server process [31382]>INFO:uvicorn.error:Waiting for application startup.>INFO:uvicorn.error:Application startup complete.>INFO:uvicorn.error:Uvicorn running on http://0.0.0.0:5543 (Press CTRL+C to quit)
As noted in the startup command, a /docs
route was created; it contains OpenAPI specs and definitions for the expected inputs and responses.
Visiting the http://localhost:5543/docs
in a browser shows the available routes on the server.
The important one for object detection is the /predict/from_files
POST route which takes the form of a standard files argument.
The files argument enables uploading one or more image files for object detection processing.
With the expected input payload and method type defined, any HTTP request package can be used to make the request.
First, a CURL request is made to download a sample image for use with the sample request.
wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
Next, for simplicity and generality, the Python requests package is used to make a POST method request to the /predict/from_files
pathway on localhost:5543
with the downloaded file.
The predicted outputs can then be printed out or used in a later pipeline.
1import requests2import json34resp = requests.post(5 url="http://localhost:5543/predict/from_files",6 files=[('request', open('basilica.jpg', 'rb'))]7)8print(resp.text)>{"predictions":[[[175.6397705078125,487.64117431640625,346.1619873046875,616.2821655273438,0.8640249371528625,3.0],...
After that request completes, the server will also log the request as the following:
>INFO:uvicorn.error:Uvicorn running on http://0.0.0.0:5543 (Press CTRL+C to quit)>INFO: 127.0.0.1:50604 - "POST /predict/from_files HTTP/1.1" 200 OK