AWS Lambda is an event-driven, serverless computing infrastructure for deploying applications at minimal cost. Since DeepSparse runs on commodity CPUs, you can deploy DeepSparse on Lambda!
The DeepSparse GitHub repo contains a guided example for deploying a DeepSparse Pipeline on AWS Lambda for the sentiment analysis task.
The scope of this application encompasses:
The following credentials, tools, and libraries are also required:
region
that is configured in your AWS CLI matches the region passed in the SparseLambda class found in the endpoint.py
file. Currently, the default region being used is us-east-1
.docker
cli.boto3
python AWS SDK: pip install boto3
.1git clone https://github.com/neuralmagic/deepsparse.git2cd deepsparse/examples/aws-lambda3pip install -r requirements.txt
To use a different sparse model please edit the model zoo stub in the Dockerfile
.
To change pipeline configuration (e.g., change task, engine), edit the pipeline object in the app.py
file. Both files can be found in the /lambda-deepsparse/app
directory.
Run the following command to build your Lambda endpoint.
python endpoint.py create
After the endpoint has been staged (~3 minutes), AWS SAM will provide your API Gateway endpoint URL in CLI. You can start making requests by passing this URL into the LambdaClient object. Afterwards, you can run inference by passing in your text input:
1from client import LambdaClient23LC = LambdaClient("https://#########.execute-api.us-east-1.amazonaws.com/inference")4answer = LC.client({"sequences": "i like pizza"})56print(answer)
answer: {'labels': ['positive'], 'scores': [0.9990884065628052]}
On your first cold start, it will take a ~30 seconds to get your first inference, but afterwards, it should be in milliseconds.
If you want to delete your Lambda endpoint, run:
python endpoint.py destroy