Refactoring Our Lambdas Because Python Has a Bloating Problem
At Fenris, we use AWS Lambdas to support our Machine Learning and Inference pipelines. Specifically, we build many of our APIs on inference models that score leads for our customers. With the sunsetting of Python 3.6, we recently had to update some of our functions. Many of the Lambdas we hoped to upgrade had required dependencies such as pandas
, numpy
, scipy
, and scikit-learn
. These packages are significantly larger in Python 3.9 and above. The nefarious dependency bloat strikes again!! The issue here is that AWS enforces a size limit for code/packages of 250 MB. In a few cases where many of these packages were required, we weren’t able to meet this size limit. Perhaps a greater size limit (associated with greater user cost) could help to solve this challenge for many users. However, this not being a current option, we turned to some other options.
Options
When tackling this challenge, we considered the following options:
- Mounting an EFS volume with the necessary packages on the lambda that needed extra storage space for dependencies
- Shrink the size of the large dependencies by extracting code that wasn’t necessary for our purposes
- Deploy lambdas via container images through AWS ECR (Elastic Container Registry)
We eventually ruled out the first option after exploring the complexities of automating the process of uploading dependencies to an EFS volume. Specifically, uploading / updating the files on an EFS volume must be done with an EC2 instance, which was complex to build within our CI/CD systems. EFS makes more sense for exploratory and ad-hoc projects. We quickly ruled out the second approach due to the high maintenance cost of customized reduced packages. The following explains how we went about solving our problem with the containers approach.
Overview
The rest of this post is dedicated to explaining how we went about our containerized Lambda approach.
The general procedure that we followed:
- Building an image
- Manually retagging said image
- Deploying our function via SAM, which handles the publishing of our image to ECR
Getting Started:
After opening up the project for the Lambda function that you’re hoping to upgrade, make sure you are in a virtual environment that reflects the new Python version you’re planning on upgrading to. Install the appropriate updates to requirements and make sure those changes are reflected in your requirements.txt
file.
One easy way to do that is to install pur
via pip install --upgrade pur
. pur
reads requirements files, finds the latest packages for the current virtual environment’s runtime, and rewrites the requirements.txt
file to contain references to those latest package versions. Find usage instructions here.
File Structure
The screen capture below shows the file structure used for the rest of the article.
Lambda Configuration
- You’ have to add a
Dockerfile
to your project directory. This is what SAM uses to create and configure your container images. TheDockerfile
should contain something similar to the following.
FROM public.ecr.aws/lambda/python:3.9 RUN yum update -y --security COPY /project_dir/* ./ COPY requirements.txt ./=20 RUN python3.9 -m pip install -r requirements.txt # Command can be overwritten by providing a different command in the template directly. CMD ["overridden-command"]
- At the top of your SAM lambda template, you should include the following under the
Parameters
section. This value will be specified in your deployment steps.
CommitTag: Type: String Description: commit tag to be associated with a deployed image
Assuming your functions are being deployed via ZIP files with SAM (as ours were), you should do the following:
- For each function in your SAM template, you’ll want to remove the following. Each function will have some or all of the below features under the
Properties
section:
CodeUri: project_dir/ Handler: handler.lambda_handler Runtime: python3.6
- In addition, remove any layers listed for each function, as well.
- You will need to add the following.
Note: PackageType
and ImageConfig
are under the Properties
section. The Metadata
 section is not part of Properties
section, it’s at the same level as the Properties
section. This is discussed shortly.
PackageType: Image ImageConfig: Command: [ "handler.lambda_handler" ] Metadata: Dockerfile: Dockerfile DockerContext: ./ DockerTag: !Ref CommitTag # specify CommitTag as a parameter at the top of the template
PackageType
should be set toImage
- The
Command
parameter underImageConfig
should be set to the path of your Lambda handler function. - Under the
Metadata
sectionDockerfile
should be set to the name of theDockerfile
you created in your project directory.Dockerfile
= is the standard naming convention.DockerContext
should be set to the name of the directory that contains yourrequirements.txt
,Dockerfile
, and project code folders.DockerTag
should remain!Ref CommitTag
Challenges:
Though most of the challenges with this process were associated with the deployment process and AWS permissions configuration, we did encounter some mis-steps with the Lambda configuration as well. Most notably, make sure the DockerContext
specified in the Metadata
section is a directory that contains the following:
- The
Dockerfile
- The
requirements.txt
file that specifies project dependencies - The Lambda code
Deployment Configuration:
We recognize that deployment steps are often unique to a project/organization, so rather than providing all of the details of our deployment logic, we’ve provided the following example of the core of our dev deployment job for containerized lambdas. We are a GitLab shop, however, this deployment pattern should be easily transferrable to other deployment systems such as GitHub Actions. Here’s what our dev-deployment
step looks like:
.deploy_containerized_lambda: &deploy_containerized_lambda - ECR_REPOSITORY=3D"${ACCOUNT_NUMBER}.dkr.ecr.us-east-1.amazonaws.com" - aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY - sam build --template sam-lambdas.yaml --no-cached --parameter-overrides CommitTag=3D$DOCKER_IMAGE_TAG - sam deploy --image-repository=3D"$ECR_REPOSITORY/$REPOSITORY_NAME" --role-arn $CLOUDFORMATION_EXECUTION_ROLE --stack-name ${STACK_NAME} --s3-prefix ${ENV} --parameter-overrides Environment=3D${ENV} CommitTag=3D$DOCKER_IMAGE_TAG --region $REGION --no-fail-on-empty-changeset - docker rmi $(docker images --filter=3Dreference=3D"*:$DOCKER_IMAGE_TAG" -q) --force deploy-dev: resource_group: dev environment: dev image: docker:latest stage: deploy needs: ["pytest"] services: - docker:dind script: - export ENV=3D"dev" - *git_install - *setup_aws_sam_tools - *assume_cloudformation_execution_role - aws cloudformation deploy --template-file ./path-to-infrastructure-template.yaml --role-arn $CLOUDFORMATION_EXECUTION_ROLE --no-fail-on-empty-changeset --stack-name ${STACK_NAME}-infrastructure --parameter-overrides Environment=3D${ENV} RepositoryName=3D$REPOSITORY_NAME --region $REGION - *deploy_containerized_lambda only: - branches except: refs: - main
Important notes:
- Variables needing clarification:
ACCOUNT_NUMBER
: AWS account numberDOCKER_IMAGE_TAG
: set to"commit-tag-${CI_COMMIT_SHORT_SHA}"
REGION
: AWS region
- Line 18: installing our templates repository so we can access the ECR repo CFN template
- Line 19: installing
aws_sam_cli
,pip
, etc. - Line 20: assume the CloudFormation deployment role (enabled with the necessary permissions) for the following deployment steps
- Line 21: deploy the ECR repo via the ECR CFN template
- Line 22: see the deploy hook listed first
Quick fixes to some issues:
- If you add any tasks to the
after_script
, you must redefine any environment variables such asSTACK_NAME
orENV
- If you want to reference any of the helper jobs used via yaml hooks in the
gitlab-ci-templates
repository, you can use the following syntax:- !reference [.helper_job_name]
. This is equivalent to using- *helper_job_name
for helper jobs defined in the project’sgitlab-ci.yml
file.
Finally, remember: Lambda functions that have already been deployed as zip files cannot be converted into lambdas deployed via containers. Hence the reason why the zip functions must first be deleted, then the containerized functions can be deployed, or the stacks renamed, and the new functions deployed.
Challenges:
The most challenging part of this whole process was configuring permissions appropriately. Especially cross account, this was an issue. We ran into cross account permissions issues when deploying lambdas in our prod account. This could also be an issue when building systems in any account other than the one in which functions are initially deployed. The key permissions needed for the IAM role assumed by our CI/CD process when running the deploy step – for both the ECR infrastructure and the lambda deployments were:
"ecr:SetRepositoryPolicy", "ecr:GetRepositoryPolicy", "ecr:DescribeRepositories", "ecr:BatchCheckLayerAvailability", "ecr:BatchGetImage", "ecr:GetAuthorizationToken", "ecr:GetDownloadUrlForLayer"
Summary:
Though we haven’t listed all of our code for deployment, we hope we have provided you with some helpful hints and guidance for how one might go about deploying lambdas via container images in an integrated deployment system.
This change will save us lots of effort in the future with regards to Python runtime upgrades for our large suite of Lambda functions.
Additional Resources:
If you’re looking for additional guidance with this, here are some resources we used: