Version: nightly

Deploying With DeepSparse Server

This section explains how to deploy with DeepSparse Server.

Installation Requirements

This use case requires the installation of DeepSparse Server.

Usage

DeepSparse Server allows you to serve models and Pipelines for deployment in HTTP. The server runs on top of the popular FastAPI web framework and Uvicorn web server. The server supports any task from DeepSparse, such as Pipelines including NLP, image classification, and object detection tasks. An updated list of available tasks can be found in the DeepSparse Pipelines Introduction.

Run the help CLI to look up the available arguments.

$ deepsparse.server --help

> Usage: deepsparse.server [OPTIONS] COMMAND [ARGS]...
>
>   Start a DeepSparse inference server for serving the models and pipelines.
>
>       1. `deepsparse.server config [OPTIONS] <config path>`
>
>       2. `deepsparse.server task [OPTIONS] <task>
>
>   Examples for using the server:
>
>       `deepsparse.server config server-config.yaml`
>
>       `deepsparse.server task question_answering --batch-size 2`
>
>       `deepsparse.server task question_answering --host "0.0.0.0"`
>
>   Example config.yaml for serving:
>
>   \```yaml
>   num_cores: 2
>   num_workers: 2
>   endpoints:
>     - task: question_answering
>       route: /unpruned/predict
>       model: zoo:some/zoo/stub
>     - task: question_answering
>       route: /pruned/predict
>       model: /path/to/local/model
>   \```
>
> Options:
>   --help  Show this message and exit.
>
> Commands:
>   config  Run the server using configuration from a .yaml file.
>   task    Run the server using configuration with CLI options, which can...

Single Model Inference

Here is an example CLI command for serving a single model for the question answering task:

deepsparse.server \
    task question_answering \
    --model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"

To make a request to your server, use the requests library and pass the request URL:

import requests

url = "http://localhost:5543/predict"

obj = {
    "question": "Who is Mark?",
    "context": "Mark is batman."
}

response = requests.post(url, json=obj)

In addition, you can make a request with a curl command from the terminal:

curl -X POST \
  'http://localhost:5543/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "question": "Who is Mark?",
  "context": "Mark is batman."
}'

Multiple Model Inference

To serve multiple models, you can build a config.yaml file. In the sample YAML file below, we are defining two BERT models to be served by the deepsparse.server for the question answering task:

num_cores: 2
num_workers: 2
endpoints:
    - task: question_answering
      route: /unpruned/predict
      model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none
      batch_size: 1
    - task: question_answering
      route: /pruned/predict
      model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni
      batch_size: 1

You can now run the server with the configuration file path using the config subcommand:

deepsparse.server config config.yaml

You can send requests to a specific model by appending the model's alias from the config.yaml to the end of the request url. For example, to call the second model, you can send a request to its configured route:

import requests

url = "http://localhost:5543/pruned/predict"

obj = {
    "question": "Who is Mark?",
    "context": "Mark is batman."
}

response = requests.post(url, json=obj)

PRO TIP: While your server is running, you can always use the awesome swagger UI that's built into FastAPI to view your model's pipeline POST routes. The UI also enables you to easily make sample requests to your server. All you need is to add /docs at the end of your host URL:

localhost:5543/docs

Deploying With DeepSparse Server

Installation Requirements

Usage

Single Model Inference

Multiple Model Inference

Content

Actions

Support

Issues

Deploying With DeepSparse Server

Installation Requirements​

Usage​

Single Model Inference​

Multiple Model Inference​

Content

Actions

Support

Issues

Installation Requirements

Usage

Single Model Inference

Multiple Model Inference