ARG to Rescue: Reuse Variables in Multistage Dockerfile • CeamKrier

Introduction

The Docker ecosystem is rich with tools and best practices that streamline containerization. One of these practices is using multistage builds to create lean, efficient containers. However, as your Dockerfiles grow more complex, managing variables and maintaining readability can become a challenge. Enter the ARG instruction—your key to sharing variables across stages in a multistage Dockerfile. In this blog post, we’ll explore how ARG can simplify your Dockerfiles, enhance reusability, and maintain cleaner code.

Why Use Multistage Builds?

Multistage builds in Docker allow you to use multiple FROM statements in a single Dockerfile, creating separate stages that can be used to build a final image. This method is particularly useful for creating lightweight production images, as you can copy only the necessary artifacts from earlier stages. Here’s a quick example to illustrate:

# Stage 1: Build stage
FROM golang:1.16 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp .

# Stage 2: Production stage
FROM alpine:latest
COPY --from=builder /app/myapp /usr/local/bin/myapp
CMD ["myapp"]

In this simple example, the build stage compiles a Go application, and the production stage creates a minimal image containing only the compiled binary.

While multistage builds are powerful, they can introduce a common challenge: variable reuse. Suppose you need to use a specific version of an application or a common path across multiple stages. Without a way to share variables, you might end up duplicating code, leading to maintenance headaches and potential errors.

Here’s where ARG (argument) comes into play. The ARG instruction allows you to define variables that can be used throughout your Dockerfile, even across different stages.

Introducing ARG: The Basics

The ARG instruction defines a variable that users can pass at build time to customize the build process. Unlike environment variables (ENV), which are persisted in the image, ARG variables are only available during the build process and do not become part of the final image.

Let’s start with a basic example:

# Define the argument with a default value
ARG BASE_IMAGE=alpine:3.12

# Use the argument in the FROM instruction
FROM ${BASE_IMAGE}

RUN echo "This image is based on ${BASE_IMAGE}"

In this example, BASE_IMAGE is an argument that can be overridden when building the Dockerfile. The default value is alpine:3.12, but you could specify a different base image at build time:

docker build --build-arg BASE_IMAGE=ubuntu:20.04 -t custom-image .

The real power of ARG comes into play with multistage builds. To share ARG variables between stages, you need to redefine the ARG in each stage. Let’s look at a more advanced example:

# Define an argument for the Go version
ARG GO_VERSION=1.16

# Stage 1: Build stage
FROM golang:${GO_VERSION} AS builder
ARG GO_VERSION
WORKDIR /app
COPY . .
RUN go build -o myapp .

# Stage 2: Production stage
FROM alpine:latest
ARG GO_VERSION
RUN echo "Built with Go version ${GO_VERSION}"
COPY --from=builder /app/myapp /usr/local/bin/myapp
CMD ["myapp"]

In this Dockerfile, we define GO_VERSION as an argument at the top. By repeating ARG GO_VERSION in each stage, we make the argument available for use. Notice how the build stage uses GO_VERSION to specify the Go image, and the production stage echoes the Go version used.

Advanced Usage: Combining ARG with Environment Variables

You might find it useful to combine ARG with ENV to set environment variables conditionally based on build arguments. This can further enhance your Dockerfile’s flexibility.

Following example demonstrates the use of ARG and ENV to have a generic Dockerfile for a Turborepo application:

ARG APP_NAME="web"
ARG PNPM_HOME="/root/.local/share/pnpm"

FROM node:20-alpine AS base

FROM base AS builder
# Set working directory
WORKDIR /app

ARG APP_NAME
ARG PNPM_HOME
ENV PNPM_HOME=${PNPM_HOME}
ENV PATH="${PATH}:${PNPM_HOME}"

RUN corepack enable
RUN pnpm add -g [email protected]
COPY . .
# Collect all the necessary dependencies for the project
RUN turbo prune ${APP_NAME} --docker

# Add lockfile and package.json's of isolated subworkspace
FROM base AS installer

WORKDIR /app

ARG APP_NAME
ARG PNPM_HOME
ENV PNPM_HOME=${PNPM_HOME}
ENV PATH="${PATH}:${PNPM_HOME}"

RUN corepack enable

# First install dependencies (as they change less often)
COPY .gitignore .gitignore
COPY --from=builder /app/out/json/ .
COPY --from=builder /app/out/pnpm-lock.yaml ./pnpm-lock.yaml
RUN pnpm install

# Build the project and its dependencies
COPY --from=builder /app/out/full/ .
COPY turbo.json turbo.json

# Build the app
RUN pnpm turbo build --filter=${APP_NAME}

FROM base AS production

WORKDIR /app

ARG APP_NAME
ARG PNPM_HOME
ENV PNPM_HOME=${PNPM_HOME}
ENV PATH="${PATH}:${PNPM_HOME}"
ENV NODE_ENV="production"

RUN corepack enable

COPY --from=installer /app .

USER node

WORKDIR /app/apps/${APP_NAME}

CMD pnpm start

Key Points on Environment Variables in Multistage Builds

Environment Variables (ENV):
- Are specific to the stage where they are defined.
- Do not persist across stages.
- If you need an environment variable in multiple stages, you have to redefine it or pass it via ARG.
Build Arguments (ARG):
- Are defined once and can be passed to any stage by redeclaring them.
- Provide a way to share configuration details like versions or paths between stages.

Debugging ARG Variables

When working with ARG variables, you might run into issues where arguments aren’t passed correctly or variables aren’t set as expected. Here are some tips to help you debug:

Check Build Logs: Use docker build with the --progress=plain flag to get more detailed logs that can help identify where arguments are being used or missed.
```
docker build --progress=plain -t debug-image .
```
Echo Variables: Add RUN echo statements to print the values of your ARG variables during the build process.
```
RUN echo "ARG BASE_IMAGE=${BASE_IMAGE}"
```
Use Default Values: Define sensible default values for your ARG variables to ensure that your build doesn’t fail if arguments are not provided.

Best Practices for Using ARG

Define ARG Variables Early: Place ARG instructions at the top of your Dockerfile to make them accessible in all stages.
Use Descriptive Names: Choose meaningful names for your arguments to make the Dockerfile easier to understand and maintain.
Avoid Secrets in ARG: Never use ARG to pass sensitive data like passwords or API keys, as they can be exposed in the Docker image history.

Conclusion

Using ARG to share variables across stages in a multistage Dockerfile can significantly improve your Docker builds’ maintainability and flexibility. Whether you’re building lightweight production images or dynamically configuring your builds, ARG provides a powerful tool to streamline and enhance your Dockerfile.