Skip to main content
Version: nightly

Deploying With DeepSparse on AWS Lambda

AWS Lambda is an event-driven, serverless computing infrastructure for deploying applications at minimal cost. Since DeepSparse runs on commodity CPUs, you can deploy DeepSparse on Lambda!

The DeepSparse GitHub repo contains a guided example for deploying a DeepSparse Pipeline on AWS Lambda for the sentiment analysis task.

The scope of this application encompasses:

  1. The construction of a local Docker image.
  2. The creation of an ECR repo in AWS.
  3. Pushing the local image to ECR.
  4. The creation of the appropriate IAM permissions for handling Lambda.
  5. The creation of a Lambda function alongside an API Gateway in a CloudFormation stack.

Requirements

The following credentials, tools, and libraries are also required:

  • The AWS CLI version 2.X that is configured. Double check if the region that is configured in your AWS CLI matches the region passed in the SparseLambda class found in the endpoint.py file. Currently, the default region being used is us-east-1.
  • The AWS Serverless Application Model (AWS SAM), an open-source CLI framework used for building serverless applications on AWS.
  • Docker and the docker cli.
  • The boto3 python AWS SDK: pip install boto3.

Quick Start

git clone https://github.com/neuralmagic/deepsparse.git
cd deepsparse/examples/aws-lambda
pip install -r requirements.txt

Model Configuration

To use a different sparse model please edit the model zoo stub in the Dockerfile. To change pipeline configuration (e.g., change task, engine), edit the pipeline object in the app.py file. Both files can be found in the /lambda-deepsparse/app directory.

Create Endpoint

Run the following command to build your Lambda endpoint.

python endpoint.py create

Call Endpoint

After the endpoint has been staged (~3 minutes), AWS SAM will provide your API Gateway endpoint URL in CLI. You can start making requests by passing this URL into the LambdaClient object. Afterward, you can run inference by passing in your text input:

from client import LambdaClient

LC = LambdaClient("https://#########.execute-api.us-east-1.amazonaws.com/inference")
answer = LC.client({"sequences": "i like pizza"})

print(answer)

answer: {'labels': ['positive'], 'scores': [0.9990884065628052]}

On your first cold start, it will take a ~30 seconds to get your first inference, but afterward, it should be in milliseconds.

Delete Endpoint

If you want to delete your Lambda endpoint, run:

python endpoint.py destroy