Published on

January 5, 2024

Tech
eBook

Docker: Dependency Layers for Quicker CI

By using the following instructions, we were able to cut about five minutes off of every CI run, a drastic change that is a noticeable improvement to devel
Anden Acitelli
Sr. Software Engineer at Akkio
Tech

As Software Engineering evolves, a solid Continuous Integration + Continuous Deployment (CI/CD) process is becoming more and more of a staple. These process involve using services like GitHub Actions or CircleCI to automate various processes when triggered by pull requests, pushes to a given branch, and lots more.

One issue that we ran into here at Akkio was that our CI runs were taking forever. As a machine learning startup, we obviously have plenty of Python dependencies, and they would take upwards of six to seven minutes to install, even with the dependency caching that CircleCI automatically handles for us. For example, here’s a five-minute build where the majority of the time is just installing dependencies!

By using the following instructions, we were able to cut about five minutes off of every CI run, a drastic change that is a noticeable improvement to developer experience. So, how did we do this?

Docker!

Docker is the industry-standard containerization tool. CircleCI allows you to run a given workflow on a given Dockerfile, so our thought process logically went to - what if we could bake these dependencies into the actual Dockerfile itself? Pulling down a Docker image is much faster than installing the dependencies every time.

So, we did that! We ended up creating two Dockerfiles - one for our JavaScript dependencies (i.e. package.json) and one for our Python dependencies (i.e. requirements.txt).

Our JavaScript one looks like this.


# deps.javascript.Dockerfile
FROM ubuntu:20.04

# apt-get Dependencies
RUN apt-get update && apt-get install -y \
    curl \
    git \
    locales

# Node.js + Dependencies
RUN curl -fsSL https://deb.nodesource.com/setup_18.x | bash
RUN apt-get update && apt-get install -y nodejs

RUN mkdir app
WORKDIR app
COPY package.json .
RUN npm i -g 
RUN npm i

Our Python one looks like this.


# deps.python.Dockerfile
FROM python:3.8.16

# apt-get Dependencies
RUN apt-get update && apt-get install -y \
    curl \
    git \
    locales \
    python3-pip \
    unzip \
    wget

# Need Docker in CI to build and push images
RUN curl -fsSL https://get.docker.com -o get-docker.sh
RUN sh get-docker.sh

# Also install dockerize; unsure if still actively needed with Docker above, but can't hurt to include
ENV DOCKERIZE_VERSION v0.6.1
RUN wget https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz
RUN tar -C /usr/local/bin -xzvf dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz
RUN rm dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz

RUN mkdir app
WORKDIR app
COPY ml/requirements.txt .
RUN python3 -m pip install -r requirements.txt

Embedding into CI

So, now we have Dockerfiles with our dependencies directly embedded in them. How do we actually use these in CI to speed up our build times? It’s actually pretty straightforward for most continuous integration services - we personally use CircleCI, so will be demonstrating things there.

Here’s a minimal config.yml that should be sufficient to show the concepts:


version: 2.1
jobs:
	build:
		docker:
      - image: /deps/javascript:latest
		steps:
      - checkout
      - run:
					command: |
	          npm i
            npm run test

We simply pull from our dependencies image, do an npm i (that will be 99%+ cache hits - it’s best to do it just as insurance if you can’t guarantee that your dependencies image is fully up-to-date) and then we can run our tests!

The process is near-identical for Python, so we won’t show it.

Automatically Rebuilding Dependencies Images

The above will work fine, but requires you to manually rebuild your dependencies image every so often if you want it to be up-to-date. The longer you wait for this, the fewer cache hits you’ll get during CI and the longer your day-to-day pull requests CI will take to run, hurting your developer experience.

So, what you can do, and what we’d recommend, is to build and push this image automatically in CI every time you get a push on your main branch, or whatever your equivalent is. The CircleCI workflow for this looks something like the following.


jobs:
	build-image-deps-python:
    docker:
      - image: cimg/base:current
    steps:
      - checkout
      - when:
          condition:
            equal: [ master , << pipeline.git.branch >> ]
          steps:
            - aws-ecr/ecr-login:
                session-duration: "120"
            - aws-ecr/build-and-push-image:
                dockerfile: deps.python.Dockerfile
                repo: deps/ml-server

You may have to change the methodology a bit depending on the Docker registry you’re using (we’re using ECR) but the general process should be the same.

Conclusion

Now, you have quick CI runs and are automatically pushing up dependency updates! This is a great way to save valuable CI and developer time, and most definitely pays off long term. Hope you learned something!

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.