Refactoring Our Lambdas Because Python Has a Bloating Problem

At Fenris, we use AWS Lambdas to support our Machine Learning and Inference pipelines. Specifically, we build many of our APIs on inference models that score leads for our customers. With the sunsetting of Python 3.6, we recently had to update some of our functions. Many of the Lambdas we hoped to upgrade had required dependencies such as pandas, numpy, scipy, and scikit-learn. These packages are significantly larger in Python 3.9 and above. The nefarious dependency bloat strikes again!! The issue here is that AWS enforces a size limit for code/packages of 250 MB. In a few cases where many of these packages were required, we weren’t able to meet this size limit. Perhaps a greater size limit (associated with greater user cost) could help to solve this challenge for many users. However, this not being a current option, we turned to some other options.
Options
When tackling this challenge, we considered the following options:
- Mounting an EFS volume with the necessary packages on the lambda that needed extra storage space for dependencies
- Shrink the size of the large dependencies by extracting code that wasn’t necessary for our purposes
- Deploy lambdas via container images through AWS ECR (Elastic Container Registry)
We eventually ruled out the first option after exploring the complexities of automating the process of uploading dependencies to an EFS volume. Specifically, uploading / updating the files on an EFS volume must be done with an EC2 instance, which was complex to build within our CI/CD systems. EFS makes more sense for exploratory and ad-hoc projects. We quickly ruled out the second approach due to the high maintenance cost of customized reduced packages. The following explains how we went about solving our problem with the containers approach.
Overview
The rest of this post is dedicated to explaining how we went about our containerized Lambda approach.
The general procedure that we followed:
- Building an image
- Manually retagging said image
- Deploying our function via SAM, which handles the publishing of our image to ECR
Getting Started:
After opening up the project for the Lambda function that you’re hoping to upgrade, make sure you are in a virtual environment that reflects the new Python version you’re planning on upgrading to. Install the appropriate updates to requirements and make sure those changes are reflected in your requirements.txt file.
One easy way to do that is to install pur via pip install --upgrade pur. pur reads requirements files, finds the latest packages for the current virtual environment’s runtime, and rewrites the requirements.txt file to contain references to those latest package versions. Find usage instructions here.
File Structure
The screen capture below shows the file structure used for the rest of the article.
Lambda Configuration
- You’ have to add a
Dockerfileto your project directory. This is what SAM uses to create and configure your container images. TheDockerfileshould contain something similar to the following.
FROM public.ecr.aws/lambda/python:3.9 RUN yum update -y --security COPY /project_dir/* ./ COPY requirements.txt ./=20 RUN python3.9 -m pip install -r requirements.txt # Command can be overwritten by providing a different command in the template directly. CMD ["overridden-command"]
- At the top of your SAM lambda template, you should include the following under the
Parameterssection. This value will be specified in your deployment steps.
CommitTag: Type: String Description: commit tag to be associated with a deployed image
Assuming your functions are being deployed via ZIP files with SAM (as ours were), you should do the following:
- For each function in your SAM template, you’ll want to remove the following. Each function will have some or all of the below features under the
Propertiessection:
CodeUri: project_dir/
Handler: handler.lambda_handler
Runtime: python3.6- In addition, remove any layers listed for each function, as well.
- You will need to add the following.
Note: PackageType and ImageConfig are under the Properties section. The Metadata section is not part of Properties section, it’s at the same level as the Properties section. This is discussed shortly.
PackageType: Image
ImageConfig:
Command: [ "handler.lambda_handler" ]
Metadata:
Dockerfile: Dockerfile
DockerContext: ./
DockerTag: !Ref CommitTag # specify CommitTag as a parameter at the top of the templatePackageTypeshould be set toImage- The
Commandparameter underImageConfigshould be set to the path of your Lambda handler function. - Under the
MetadatasectionDockerfileshould be set to the name of theDockerfileyou created in your project directory.Dockerfile= is the standard naming convention.DockerContextshould be set to the name of the directory that contains yourrequirements.txt,Dockerfile, and project code folders.DockerTagshould remain!Ref CommitTag
Challenges:
Though most of the challenges with this process were associated with the deployment process and AWS permissions configuration, we did encounter some mis-steps with the Lambda configuration as well. Most notably, make sure the DockerContext specified in the Metadata section is a directory that contains the following:
- The
Dockerfile - The
requirements.txtfile that specifies project dependencies - The Lambda code
Deployment Configuration:
We recognize that deployment steps are often unique to a project/organization, so rather than providing all of the details of our deployment logic, we’ve provided the following example of the core of our dev deployment job for containerized lambdas. We are a GitLab shop, however, this deployment pattern should be easily transferrable to other deployment systems such as GitHub Actions. Here’s what our dev-deployment step looks like:
.deploy_containerized_lambda: &deploy_containerized_lambda
- ECR_REPOSITORY=3D"${ACCOUNT_NUMBER}.dkr.ecr.us-east-1.amazonaws.com"
- aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY
- sam build --template sam-lambdas.yaml --no-cached --parameter-overrides CommitTag=3D$DOCKER_IMAGE_TAG
- sam deploy --image-repository=3D"$ECR_REPOSITORY/$REPOSITORY_NAME" --role-arn $CLOUDFORMATION_EXECUTION_ROLE --stack-name ${STACK_NAME} --s3-prefix ${ENV} --parameter-overrides Environment=3D${ENV} CommitTag=3D$DOCKER_IMAGE_TAG --region $REGION --no-fail-on-empty-changeset
- docker rmi $(docker images --filter=3Dreference=3D"*:$DOCKER_IMAGE_TAG" -q) --force
deploy-dev:
resource_group: dev
environment: dev
image: docker:latest
stage: deploy
needs: ["pytest"]
services:
- docker:dind
script:
- export ENV=3D"dev"
- *git_install
- *setup_aws_sam_tools
- *assume_cloudformation_execution_role
- aws cloudformation deploy --template-file ./path-to-infrastructure-template.yaml --role-arn $CLOUDFORMATION_EXECUTION_ROLE --no-fail-on-empty-changeset --stack-name ${STACK_NAME}-infrastructure --parameter-overrides Environment=3D${ENV} RepositoryName=3D$REPOSITORY_NAME --region $REGION
- *deploy_containerized_lambda
only:
- branches
except:
refs:
- mainImportant notes:
- Variables needing clarification:
ACCOUNT_NUMBER: AWS account numberDOCKER_IMAGE_TAG: set to"commit-tag-${CI_COMMIT_SHORT_SHA}"REGION: AWS region
- Line 18: installing our templates repository so we can access the ECR repo CFN template
- Line 19: installing
aws_sam_cli,pip, etc. - Line 20: assume the CloudFormation deployment role (enabled with the necessary permissions) for the following deployment steps
- Line 21: deploy the ECR repo via the ECR CFN template
- Line 22: see the deploy hook listed first
Quick fixes to some issues:
- If you add any tasks to the
after_script, you must redefine any environment variables such asSTACK_NAMEorENV
- If you want to reference any of the helper jobs used via yaml hooks in the
gitlab-ci-templatesrepository, you can use the following syntax:- !reference [.helper_job_name]. This is equivalent to using- *helper_job_namefor helper jobs defined in the project’sgitlab-ci.ymlfile.
Finally, remember: Lambda functions that have already been deployed as zip files cannot be converted into lambdas deployed via containers. Hence the reason why the zip functions must first be deleted, then the containerized functions can be deployed, or the stacks renamed, and the new functions deployed.
Challenges:
The most challenging part of this whole process was configuring permissions appropriately. Especially cross account, this was an issue. We ran into cross account permissions issues when deploying lambdas in our prod account. This could also be an issue when building systems in any account other than the one in which functions are initially deployed. The key permissions needed for the IAM role assumed by our CI/CD process when running the deploy step – for both the ECR infrastructure and the lambda deployments were:
"ecr:SetRepositoryPolicy", "ecr:GetRepositoryPolicy", "ecr:DescribeRepositories", "ecr:BatchCheckLayerAvailability", "ecr:BatchGetImage", "ecr:GetAuthorizationToken", "ecr:GetDownloadUrlForLayer"
Summary:
Though we haven’t listed all of our code for deployment, we hope we have provided you with some helpful hints and guidance for how one might go about deploying lambdas via container images in an integrated deployment system.
This change will save us lots of effort in the future with regards to Python runtime upgrades for our large suite of Lambda functions.
Additional Resources:
If you’re looking for additional guidance with this, here are some resources we used:
