Amazon SageMaker offers an easy-to-use infrastructure for deploying deep learning models at scale. This directory provides a guided example for deploying a DeepSparse inference server on SageMaker for the question answering NLP task. Deployments benefit from both sparse-CPU acceleration with DeepSparse and automatic scaling from SageMaker.
The listed steps can be easily completed using python
and bash
. The following
credentials, tools, and libraries are also required:
region
that is configured in your AWS CLI matches the region in the SparseMaker class found in the endpoint.py
file. Currently, the default region being used is us-east-1
.AmazonSageMakerFullAccess
ROLE_ARN
. It should take the form "arn:aws:iam::XXX:role/service-role/XXX"
. In addition to role permissions, make sure the AWS user who configured the AWS CLI configuration has ECR/SageMaker permissions.docker
CLI.boto3
Python AWS SDK (pip install boto3
).1git clone https://github.com/neuralmagic/deepsparse.git2cd deepsparse/examples/aws-sagemaker3pip install -r requirements.txt
Before starting, replace the role_arn
PLACEHOLDER string with your AWS ARN at the bottom of SparseMaker class on the endpoint.py
file. Your ARN should look something like this: "arn:aws:iam::XXX:role/service-role/XXX"
Run the following command to build your SageMaker endpoint.
python endpoint.py create
After the endpoint has been staged (~1 minute), you can start making requests by passing your endpoint region name
and your endpoint name
. Afterwards, you can run inference by passing in your question and context:
1from qa_client import Endpoint234qa = Endpoint("us-east-1", "question-answering-example-endpoint")5answer = qa.predict(question="who is batman?", context="Mark is batman.")67print(answer)
The answer is: b'{"score":0.6484262943267822,"answer":"Mark","start":0,"end":4}'
If you want to delete your endpoint, use:
python endpoint.py destroy
Continue reading to learn more about the files in this directory, the build requirements, and a descriptive step-by-step guide for launching a SageMaker endpoint.
In addition to the step-by-step instructions below, the directory contains files to aid in the deployment.
The included Dockerfile
builds an image on top of the standard python:3.8
image
with deepsparse
installed, and creates an executable command serve
that runs
deepsparse.server
on port 8080. SageMaker will execute this image by running
docker run serve
and expects the image to serve inference requests at the
invocations/
endpoint.
For general customization of the server, changes should not need to be made
to the Dockerfile but, instead, to the config.yaml
file from which the Dockerfile reads.
config.yaml
is used to configure DeepSparse Server running in the Dockerfile.
The configuration must contain the line integration: sagemaker
so
endpoints may be provisioned correctly to match SageMaker specifications.
Notice that the model_path
and task
are set to run a sparse-quantized
question answering model from SparseZoo.
To use a model directory stored in s3
, set model_path
to /opt/ml/model
in
the configuration and add ModelDataUrl=<MODEL-S3-PATH>
to the CreateModel
arguments.
SageMaker will automatically copy the files from the s3 path into /opt/ml/model
from which the server then can read.
This is a Bash
script for pushing your local Docker image to the AWS ECR repository.
This file contains the SparseMaker object for automating the build of a SageMaker endpoint from a Docker image. You have the option to customize the parameters of the class in order to match the prefered state of your deployment.
This file contains a client object for making requests to the SageMaker inference endpoint for the question answering task.
Review DeepSparse Server for more information about the server and its configuration.
The following steps are required to provision and deploy DeepSparse to SageMaker for inference:
Dockerfile
into a local docker image.Model
that reads from the hosted ECR image.EndpointConfig
that defines how to provision the model deployment.Endpoint
defined by the Model
and EndpointConfig
.Build the Dockerfile
from this directory from a bash shell using the following command.
The image will be tagged locally as deepsparse-sagemaker-example
.
docker build -t deepsparse-sagemaker-example .
Use the following code snippet in Python to create an ECR repository.
The region_name
can be swapped to a preferred region. The repository will be named
deepsparse-sagemaker
. If the repository is already created, you may skip this step.
1import boto323ecr = boto3.client("ecr", region_name='us-east-1')4create_repository_res = ecr.create_repository(repositoryName="deepsparse-sagemaker")
Once the image is built and the ECR repository is created, you can push the image using the following bash commands.
1account=$(aws sts get-caller-identity --query Account | sed -e 's/^"//' -e 's/"$//')2region=$(aws configure get region)3ecr_account=${account}.dkr.ecr.${region}.amazonaws.com45aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $ecr_account6fullname=$ecr_account/deepsparse-sagemaker:latest78docker tag deepsparse-sagemaker-example:latest $fullname9docker push $fullname
An abbreviated successful output will look like:
1Login Succeeded2The push refers to repository [XXX.dkr.ecr.us-east-1.amazonaws.com/deepsparse-example]33c2284f66840: Preparing408fa02ce37eb: Preparing5a037458de4e0: Preparing6bafdbe68e4ae: Preparing7a13c519c6361: Preparing86817758dd480: Waiting96d95196cbe50: Waiting10e9872b0f234f: Waiting11c18b71656bcf: Waiting122174eedecc00: Waiting1303ea99cd5cd8: Pushed14585a375d16ff: Pushed155bdcc8e2060c: Pushed16latest: digest: sha256:XXX size: 3884
Create a SageMaker Model
referencing the pushed image.
The example model will be named question-answering-example
.
As mentioned in the requirements, ROLE_ARN
should be a string arn of an AWS
role with full access to SageMaker.
1import boto323sm_boto3 = boto3.client("sagemaker", region_name="us-east-1")45region = boto3.Session().region_name6account_id = boto3.client("sts").get_caller_identity()["Account"]78image_uri = "{}.dkr.ecr.{}.amazonaws.com/deepsparse-sagemaker:latest".format(account_id, region)910create_model_res = sm_boto3.create_model(11 ModelName="question-answering-example",12 Containers=[13 {14 "Image": image_uri,15 },16 ],17 ExecutionRoleArn=ROLE_ARN,18 EnableNetworkIsolation=False,19)
Refer to AWS documentation for more information about options for configuring SageMaker Model
instances.
The EndpointConfig
is used to set the instance type to provision, how many, scaling
rules, and other deployment settings. The following code snippet defines an endpoint
with a single machine using an ml.c5.large
CPU.
1model_name = "question-answering-example" # model defined above2initial_instance_count = 13instance_type = "ml.c5.2xlarge" # 8 vcpus45variant_name = "QuestionAnsweringDeepSparseDemo" # ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}67production_variants = [8 {9 "VariantName": variant_name,10 "ModelName": model_name,11 "InitialInstanceCount": initial_instance_count,12 "InstanceType": instance_type,13 }14]1516endpoint_config_name = "QuestionAnsweringExampleConfig" # ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}1718endpoint_config = {19 "EndpointConfigName": endpoint_config_name,20 "ProductionVariants": production_variants,21}2223endpoint_config_res = sm_boto3.create_endpoint_config(**endpoint_config)
Once the EndpointConfig
is defined, launch the endpoint using
the create_endpoint
command:
1endpoint_name = "question-answering-example-endpoint"2endpoint_res = sm_boto3.create_endpoint(3 EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name4)
After creating the endpoint, you can check its status by running the following.
Initially, the EndpointStatus
will be Creating
. Checking after the image is
successfully launched, it will be InService
. If there are any errors, it will
be Failed
.
1from pprint import pprint2pprint(sm_boto3.describe_endpoint(EndpointName=endpoint_name))
After the endpoint is in service, you can make requests to it through the
invoke_endpoint
API. Inputs will be passed as a JSON payload.
1import json23sm_runtime = boto3.client("sagemaker-runtime", region_name="us-east-1")45body = json.dumps(6 dict(7 question="Where do I live?",8 context="I am a student and I live in Cambridge",9 )10)1112content_type = "application/json"13accept = "text/plain"1415res = sm_runtime.invoke_endpoint(16 EndpointName=endpoint_name,17 Body=body,18 ContentType=content_type,19 Accept=accept,20)2122print(res["Body"].readlines())
You can delete the model and endpoint with the following commands:
1sm_boto3.delete_endpoint(EndpointName=endpoint_name)2sm_boto3.delete_endpoint_config(EndpointConfigName=endpoint_config_name)3sm_boto3.delete_model(ModelName=model_name)
These steps create an invokable SageMaker inference endpoint powered by DeepSparse.
The EndpointConfig
settings may be adjusted to set instance scaling rules based
on deployment needs.
Refer to AWS documentation for more information on deploying custom models with SageMaker.