Neural Magic LogoNeural Magic Logo
DeepSparse EngineSparseMLSparseZoo
User Guides
Deploying DeepSparse
AWS Lambda

Deploying with DeepSparse on AWS Lambda

AWS Lambda is an event-driven, serverless computing infrastructure for deploying applications at minimal cost. Since DeepSparse runs on commodity CPUs, you can deploy DeepSparse on Lambda!

The DeepSparse GitHub repo contains a guided example for deploying a DeepSparse Pipeline on AWS Lambda for the sentiment analysis task.

The scope of this application encompasses:

  1. The construction of a local Docker image.
  2. The creation of an ECR repo in AWS.
  3. Pushing the local image to ECR.
  4. The creation of the appropriate IAM permissions for handling Lambda.
  5. The creation of a Lambda function alongside an API Gateway in a CloudFormation stack.


The following credentials, tools, and libraries are also required:

  • The AWS CLI version 2.X that is configured. Double check if the region that is configured in your AWS CLI matches the region passed in the SparseLambda class found in the file. Currently, the default region being used is us-east-1.
  • The AWS Serverless Application Model (AWS SAM), an open-source CLI framework used for building serverless applications on AWS.
  • Docker and the docker cli.
  • The boto3 python AWS SDK: pip install boto3.

Quick Start

1git clone
2cd deepsparse/examples/aws-lambda
3pip install -r requirements.txt

Model Configuration

To use a different sparse model please edit the model zoo stub in the Dockerfile. To change pipeline configuration (e.g., change task, engine), edit the pipeline object in the file. Both files can be found in the /lambda-deepsparse/app directory.

Create Endpoint

Run the following command to build your Lambda endpoint.

python create

Call Endpoint

After the endpoint has been staged (~3 minutes), AWS SAM will provide your API Gateway endpoint URL in CLI. You can start making requests by passing this URL into the LambdaClient object. Afterwards, you can run inference by passing in your text input:

1from client import LambdaClient
3LC = LambdaClient("")
4answer = LC.client({"sequences": "i like pizza"})

answer: {'labels': ['positive'], 'scores': [0.9990884065628052]}

On your first cold start, it will take a ~30 seconds to get your first inference, but afterwards, it should be in milliseconds.

Delete Endpoint

If you want to delete your Lambda endpoint, run:

python destroy
Deploying with DeepSparse on Amazon SageMaker
Using DeepSparse on Google Cloud Run