Refactoring Lambdas Because Python is Bloated

Refactoring Our Lambdas Because Python Has a Bloating Problem

At Fenris, we use AWS Lambdas to support our Machine Learning and Inference pipelines. Specifically, we build many of our APIs on inference models that score leads for our customers. With the sunsetting of Python 3.6, we recently had to update some of our functions. Many of the Lambdas we hoped to upgrade had required dependencies such as pandas, numpy, scipy, and scikit-learn. These packages are significantly larger in Python 3.9 and above. The nefarious dependency bloat strikes again!! The issue here is that AWS enforces a size limit for code/packages of 250 MB. In a few cases where many of these packages were required, we weren’t able to meet this size limit. Perhaps a greater size limit (associated with greater user cost) could help to solve this challenge for many users. However, this not being a current option, we turned to some other options.

Options

When tackling this challenge, we considered the following options:

Mounting an EFS volume with the necessary packages on the lambda that needed extra storage space for dependencies
Shrink the size of the large dependencies by extracting code that wasn’t necessary for our purposes
Deploy lambdas via container images through AWS ECR (Elastic Container Registry)

We eventually ruled out the first option after exploring the complexities of automating the process of uploading dependencies to an EFS volume. Specifically, uploading / updating the files on an EFS volume must be done with an EC2 instance, which was complex to build within our CI/CD systems. EFS makes more sense for exploratory and ad-hoc projects. We quickly ruled out the second approach due to the high maintenance cost of customized reduced packages. The following explains how we went about solving our problem with the containers approach.

Overview

The rest of this post is dedicated to explaining how we went about our containerized Lambda approach.

The general procedure that we followed:

Building an image
Manually retagging said image
Deploying our function via SAM, which handles the publishing of our image to ECR

Getting Started:

After opening up the project for the Lambda function that you’re hoping to upgrade, make sure you are in a virtual environment that reflects the new Python version you’re planning on upgrading to. Install the appropriate updates to requirements and make sure those changes are reflected in your requirements.txt file.

One easy way to do that is to install pur via pip install --upgrade pur. pur reads requirements files, finds the latest packages for the current virtual environment’s runtime, and rewrites the requirements.txt file to contain references to those latest package versions. Find usage instructions here.

File Structure

The screen capture below shows the file structure used for the rest of the article.

Lambda Configuration

You’ have to add a Dockerfile to your project directory. This is what SAM uses to create and configure your container images. The Dockerfile should contain something similar to the following.

FROM public.ecr.aws/lambda/python:3.9

RUN yum update -y --security

COPY /project_dir/* ./
COPY requirements.txt ./=20

RUN python3.9 -m pip install -r requirements.txt

# Command can be overwritten by providing a different command in the template directly.
CMD ["overridden-command"]

At the top of your SAM lambda template, you should include the following under the Parameters section. This value will be specified in your deployment steps.

CommitTag:
  Type: String
  Description: commit tag to be associated with a deployed image

Assuming your functions are being deployed via ZIP files with SAM (as ours were), you should do the following:

For each function in your SAM template, you’ll want to remove the following. Each function will have some or all of the below features under the Properties section:

      CodeUri: project_dir/
      Handler: handler.lambda_handler
      Runtime: python3.6

In addition, remove any layers listed for each function, as well.
You will need to add the following.

Note: PackageType and ImageConfig are under the Properties section. The Metadata section is not part of Properties section, it’s at the same level as the Properties section. This is discussed shortly.

      PackageType: Image
      ImageConfig:
        Command: [ "handler.lambda_handler" ]
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./
      DockerTag: !Ref CommitTag # specify CommitTag as a parameter at the top of the template

PackageType should be set to Image
The Command parameter under ImageConfig should be set to the path of your Lambda handler function.
Under the Metadata section
1. Dockerfile should be set to the name of the Dockerfile you created in your project directory. Dockerfile= is the standard naming convention.
2. DockerContext should be set to the name of the directory that contains your requirements.txt, Dockerfile, and project code folders.
3. DockerTag should remain !Ref CommitTag

Challenges:

Though most of the challenges with this process were associated with the deployment process and AWS permissions configuration, we did encounter some mis-steps with the Lambda configuration as well. Most notably, make sure the DockerContext specified in the Metadata section is a directory that contains the following:

The Dockerfile
The requirements.txt file that specifies project dependencies
The Lambda code

Deployment Configuration:

We recognize that deployment steps are often unique to a project/organization, so rather than providing all of the details of our deployment logic, we’ve provided the following example of the core of our dev deployment job for containerized lambdas. We are a GitLab shop, however, this deployment pattern should be easily transferrable to other deployment systems such as GitHub Actions. Here’s what our dev-deployment step looks like:

.deploy_containerized_lambda: &deploy_containerized_lambda
  - ECR_REPOSITORY=3D"${ACCOUNT_NUMBER}.dkr.ecr.us-east-1.amazonaws.com"
  - aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY
  - sam build --template sam-lambdas.yaml --no-cached --parameter-overrides CommitTag=3D$DOCKER_IMAGE_TAG
  - sam deploy --image-repository=3D"$ECR_REPOSITORY/$REPOSITORY_NAME" --role-arn $CLOUDFORMATION_EXECUTION_ROLE --stack-name ${STACK_NAME} --s3-prefix ${ENV} --parameter-overrides Environment=3D${ENV} CommitTag=3D$DOCKER_IMAGE_TAG --region $REGION --no-fail-on-empty-changeset
  - docker rmi $(docker images --filter=3Dreference=3D"*:$DOCKER_IMAGE_TAG" -q) --force

deploy-dev:
  resource_group: dev
  environment: dev
  image: docker:latest
  stage: deploy
  needs: ["pytest"]
  services:
    - docker:dind
  script:
    - export ENV=3D"dev"
    - *git_install
    - *setup_aws_sam_tools
    - *assume_cloudformation_execution_role
    - aws cloudformation deploy --template-file ./path-to-infrastructure-template.yaml --role-arn $CLOUDFORMATION_EXECUTION_ROLE --no-fail-on-empty-changeset --stack-name ${STACK_NAME}-infrastructure --parameter-overrides Environment=3D${ENV} RepositoryName=3D$REPOSITORY_NAME --region $REGION
    - *deploy_containerized_lambda
  only:
    - branches
  except:
    refs:
      - main

Important notes:

Variables needing clarification:
- ACCOUNT_NUMBER: AWS account number
- DOCKER_IMAGE_TAG: set to "commit-tag-${CI_COMMIT_SHORT_SHA}"
- REGION: AWS region
Line 18: installing our templates repository so we can access the ECR repo CFN template
Line 19: installing aws_sam_cli, pip, etc.
Line 20: assume the CloudFormation deployment role (enabled with the necessary permissions) for the following deployment steps
Line 21: deploy the ECR repo via the ECR CFN template
Line 22: see the deploy hook listed first

Quick fixes to some issues:

If you add any tasks to the after_script, you must redefine any environment variables such as STACK_NAME or ENV
If you want to reference any of the helper jobs used via yaml hooks in the gitlab-ci-templates repository, you can use the following syntax: - !reference [.helper_job_name]. This is equivalent to using - *helper_job_name for helper jobs defined in the project’s gitlab-ci.yml file.

Finally, remember: Lambda functions that have already been deployed as zip files cannot be converted into lambdas deployed via containers. Hence the reason why the zip functions must first be deleted, then the containerized functions can be deployed, or the stacks renamed, and the new functions deployed.

Challenges:

The most challenging part of this whole process was configuring permissions appropriately. Especially cross account, this was an issue. We ran into cross account permissions issues when deploying lambdas in our prod account. This could also be an issue when building systems in any account other than the one in which functions are initially deployed. The key permissions needed for the IAM role assumed by our CI/CD process when running the deploy step – for both the ECR infrastructure and the lambda deployments were:

"ecr:SetRepositoryPolicy",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:GetAuthorizationToken",
"ecr:GetDownloadUrlForLayer"

Summary:

Though we haven’t listed all of our code for deployment, we hope we have provided you with some helpful hints and guidance for how one might go about deploying lambdas via container images in an integrated deployment system.

This change will save us lots of effort in the future with regards to Python runtime upgrades for our large suite of Lambda functions.

Additional Resources:

If you’re looking for additional guidance with this, here are some resources we used:

Refactoring Our Lambdas Because Python Has a Bloating Problem

Options

Overview

Getting Started:

File Structure

Lambda Configuration

Challenges:

Deployment Configuration:

Challenges:

Summary:

Additional Resources:

Solutions

Insurance

Resources