Deploying With DeepSparse on AWS Lambda
AWS Lambda is an event-driven, serverless computing infrastructure for deploying applications at minimal cost. Since DeepSparse runs on commodity CPUs, you can deploy DeepSparse on Lambda!
The DeepSparse GitHub repo contains a guided example for deploying a DeepSparse Pipeline on AWS Lambda for the sentiment analysis task.
The scope of this application encompasses:
- The construction of a local Docker image.
- The creation of an ECR repo in AWS.
- Pushing the local image to ECR.
- The creation of the appropriate IAM permissions for handling Lambda.
- The creation of a Lambda function alongside an API Gateway in a CloudFormation stack.
Requirements
The following credentials, tools, and libraries are also required:
- The AWS CLI version 2.X that is configured. Double check if the
region
that is configured in your AWS CLI matches the region passed in the SparseLambda class found in theendpoint.py
file. Currently, the default region being used isus-east-1
. - The AWS Serverless Application Model (AWS SAM), an open-source CLI framework used for building serverless applications on AWS.
- Docker and the
docker
cli. - The
boto3
python AWS SDK:pip install boto3
.
Quick Start
git clone https://github.com/neuralmagic/deepsparse.git
cd deepsparse/examples/aws-lambda
pip install -r requirements.txt
Model Configuration
To use a different sparse model please edit the model zoo stub in the Dockerfile
.
To change pipeline configuration (e.g., change task, engine), edit the pipeline object in the app.py
file. Both files can be found in the /lambda-deepsparse/app
directory.
Create Endpoint
Run the following command to build your Lambda endpoint.
python endpoint.py create
Call Endpoint
After the endpoint has been staged (~3 minutes), AWS SAM will provide your API Gateway endpoint URL in CLI. You can start making requests by passing this URL into the LambdaClient object. Afterward, you can run inference by passing in your text input:
from client import LambdaClient
LC = LambdaClient("https://#########.execute-api.us-east-1.amazonaws.com/inference")
answer = LC.client({"sequences": "i like pizza"})
print(answer)
answer: {'labels': ['positive'], 'scores': [0.9990884065628052]}
On your first cold start, it will take a ~30 seconds to get your first inference, but afterward, it should be in milliseconds.
Delete Endpoint
If you want to delete your Lambda endpoint, run:
python endpoint.py destroy