🚨 Note: The current Docs site is outdated. Neural Magic's 1.7 release slated for January 2024 will include a Docs refresh. Meanwhile, please consult our GitHub repositories for the content:   DeepSparse,   SparseML,   SparseZoo.
Neural Magic LogoNeural Magic Logo
Get Started
Deploy a Model
NLP Text Classification

Deploy a Text Classification Model

This page walks through an example of deploying a text-classification model with DeepSparse Server.

DeepSparse Server is a server wrapper around Pipelines, including the sentiment analysis pipeline. As such, the server provides an HTTP interface that accepts raw text sequences as inputs and responds with the labeled predictions. In this way, DeepSparse combines the simplicity of servable pipelines with GPU-class performance on CPUs for sparse models.

Install Requirements

This example requires DeepSparse Server Install.

Start the Server

Before starting the server, the model must be set up in the format expected for DeepSparse Pipelines. See an example of how to set up Pipelines in the Use a Model section.

Once the Pipelines are set up, the deepsparse.server command launches a server with the model at --model_path inside. The model_path can either be a SparseZoo stub or a local model path.

The command below starts up DeepSparse Server for a sparsified DistilBERT model (from the SparseZoo) trained on the SST2 dataset for sentiment analysis. The output confirms the server was started on port :5543 with a /docs route for general info and a /predict route for inference.

$deepsparse.server \
--task "sentiment-analysis" \
--model_path "zoo:nlp/sentiment_analysis/distilbert-none/pytorch/huggingface/sst2/pruned80_quant-none-vnni"
>deepsparse.server.main INFO created FastAPI app for inference serving
>deepsparse.server.main INFO created general routes, visit `/docs` to view available
>DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 1.1.0 COMMUNITY EDITION (a436ca67) (release) (optimized) (system=avx512_vnni, binary=avx512)
>deepsparse.server.main INFO created route /predict
>INFO:deepsparse.server.main:created route /predict
>INFO:uvicorn.error:Started server process [23146]
>INFO:uvicorn.error:Waiting for application startup.
>INFO:uvicorn.error:Application startup complete.
>INFO:uvicorn.error:Uvicorn running on (Press CTRL+C to quit)

View the Request Specs

As noted in the startup command, a /docs route was created; it contains OpenAPI specs and definitions for the expected inputs and responses. Visiting the http://localhost:5543/docs in a browser shows the available routes on the server. For the /predict route specifically, it shows the following as the expected input schema:

2 description: Schema for inputs to text_classification pipelines
3 sequences* Sequences{
4 description: A string or List of strings representing input totext_classification task
5 anyOf ->
6 [[string]]
7 [string]
8 string
9 }

Utilizing the request spec, a valid input for the sentiment analysis would be:

2 "sequences": [
3 "Snorlax loves my Tesla!"
4 ]

Make a Request

With the expected input payload and method type defined, any HTTP request package can be used to make the request. For simplicity and generality, the curl command is used.

The code below makes a POST method request to the /predict pathway on localhost:5543 with the JSON payload created above. The predicted outputs from the model are then printed in the terminal.

$curl 'http://localhost:5543/predict' \
-H 'Content-type: application/json' \
-d '{"sequences": ["Snorlax loves my Tesla!"]}'

After that request completes, the server will also log the request as the following:

>INFO:uvicorn.error:Uvicorn running on (Press CTRL+C to quit)
>INFO: - "POST /predict HTTP/1.1" 200 OK
Deploy a Model
Deploy an Object Detection Model