Docker

From BloomWiki
Jump to navigation Jump to search

How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?

Docker is a platform for packaging software into self-contained units called containers. A container bundles an application together with everything it needs to run � code, runtime, libraries, configuration � so it behaves identically regardless of where it is deployed. Docker solved the pervasive "works on my machine" problem and became the foundation of modern software deployment.

Remembering[edit]

Key terms:

  • Container � a lightweight, isolated process that runs an application and its dependencies, sharing the host OS kernel.
  • Image � a read-only snapshot that defines a container's filesystem and configuration. Running an image produces a container.
  • Dockerfile � a text file of instructions used to build an image layer by layer.
  • Docker Hub � the default public registry where images are stored and shared.
  • Registry � a server that stores and serves Docker images (public: Docker Hub, GitHub Container Registry; private: AWS ECR, GCP Artifact Registry).
  • Layer � each instruction in a Dockerfile produces a layer; layers are cached and reused across builds to speed up rebuilds.
  • Volume � a mechanism for persisting data outside the container's ephemeral filesystem.
  • Port mapping � connecting a port on the host machine to a port inside the container (e.g., -p 8080:80).
  • Docker Compose � a tool for defining and running multi-container applications via a YAML file.
  • Orchestration � managing multiple containers across multiple machines (Kubernetes is the dominant tool).

Distinction: a virtual machine runs a full OS on emulated hardware; a container shares the host kernel and is thus much lighter (~MB vs ~GB, seconds to start vs minutes).

Understanding[edit]

Containers work through two Linux kernel features:

  • Namespaces � isolate what a process can see (its own process tree, network interfaces, filesystem, hostname). Each container has its own namespace, so it cannot see or interfere with processes in other containers.
  • cgroups (control groups) � limit and account for resource usage (CPU, memory, disk I/O). A container can be capped at, say, 512MB RAM regardless of what the host has available.

Docker adds a layer of tooling on top: image management, a build system, a networking model, and a runtime (containerd).

When you run a container from an image, Docker creates a thin writable layer on top of the read-only image layers. The image itself is never modified. Multiple containers can run from the same image simultaneously, each with their own writable layer � this is why images are efficient to share and fast to start.

The image layer cache is key to understanding build performance: Docker compares each Dockerfile instruction to its cache. The first changed instruction invalidates the cache for all subsequent instructions. Ordering Dockerfile instructions from least-to-most-frequently-changed minimizes unnecessary rebuilds.

Applying[edit]

Common workflows:

Pull and run an existing image
docker pull postgres:16

docker run -d --name mydb -e POSTGRES_PASSWORD=secret -p 5432:5432 postgres:16

Write a Dockerfile for a Node.js app
FROM node:20-alpine

WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . EXPOSE 3000 CMD ["node", "server.js"]

Build and run it
docker build -t my-app:latest .

docker run -d -p 3000:3000 my-app:latest

Docker Compose for a web app + database
services:
 web:
   build: .
   ports:
     - "3000:3000"
   environment:
     DATABASE_URL: postgres://user:pass@db:5432/mydb
   depends_on:
     - db
 db:
   image: postgres:16
   volumes:
     - pg_data:/var/lib/postgresql/data
   environment:
     POSTGRES_PASSWORD: pass

volumes:

 pg_data:
Inspect and debug
docker logs my-app # view stdout/stderr

docker exec -it my-app sh # open a shell inside a running container docker inspect my-app # full JSON metadata about a container

Analyzing[edit]

Image size matters. Large images take longer to push, pull, and start. Common sources of bloat:

  • Using a full OS base image (ubuntu:latest ~80MB) instead of a slim or Alpine variant (~5MB)
  • Build tools (compilers, dev dependencies) left in the final image
  • File copies that include unnecessary files (.git, node_modules on Node, __pycache__ on Python)

Multi-stage builds solve the build-tools problem: use one stage with a full build environment, then copy only the compiled artifact into a minimal final stage.

FROM golang:1.22 AS builder

WORKDIR /app COPY . . RUN go build -o server .

FROM alpine:3.19 COPY --from=builder /app/server /server CMD ["/server"]

The final image contains only Alpine and the compiled binary � no Go toolchain.

Layer cache invalidation is a common performance trap. Copying the full source before installing dependencies means any code change invalidates the dependency install layer. Always copy dependency manifests first, install, then copy source code.

Container statefulness: containers are designed to be ephemeral. Any data written inside a container's writable layer is lost when the container is removed. Databases, uploaded files, and anything that must persist must use volumes or external storage.

Networking modes: containers on the same Docker network can reach each other by service name. Containers on different networks cannot see each other by default. The default bridge network does not provide DNS resolution between containers � use a named network with Compose or explicit network creation.

Evaluating[edit]

Signs of a mature Docker setup:

  • Images are built from minimal base images and use multi-stage builds; final image size is justified.
  • Dockerfile instructions are ordered to maximize cache hits (dependencies before source code).
  • No secrets (API keys, passwords) appear in ENV instructions or are COPY'd into the image � they are injected at runtime via environment variables or secret management tools.
  • Containers run as a non-root user (USER instruction) to limit blast radius if compromised.
  • Health checks are defined so orchestrators know when a container is actually ready.
  • Images are tagged with specific versions or git SHAs, not just "latest", so deployments are reproducible.

Expert trade-offs:

  • Docker Compose vs. Kubernetes: Compose is appropriate for local development and simple single-host production deployments. Kubernetes is necessary when you need automatic scheduling across multiple hosts, self-healing, rolling updates, and fine-grained resource management � but brings significant operational complexity.
  • Build in CI, not locally: developer-built images are not reproducible. Build and push images from a CI pipeline (GitHub Actions, GitLab CI, etc.) from a clean environment with pinned dependencies.
  • Image scanning: production images should be scanned for known CVEs before deployment (Trivy, Grype, or native registry scanning). A clean build can contain a vulnerable base image.

Creating[edit]

Designing a container-based deployment pipeline end to end:

Repository structure
One Dockerfile per service, colocated with the service code. A root docker-compose.yml (or compose.override.yml) wires services together for local development with environment-specific overrides.
CI/CD pipeline
On every commit: lint the Dockerfile (hadolint), build the image, run tests inside the container (to validate the runtime environment, not just the code), scan the image for vulnerabilities, and push to the registry tagged with the git SHA. On merge to main: tag the image as a release candidate and trigger deployment.
Environment configuration
Never bake environment-specific config into the image. Use environment variables (injected by the platform at runtime), a secrets manager (AWS Secrets Manager, HashiCorp Vault), or Kubernetes Secrets. The same image binary runs in every environment; only config changes.
Logging and observability
Containers should log to stdout/stderr (not files). The orchestration layer collects and forwards logs to a central system (CloudWatch, Datadog, ELK). Add structured logging (JSON) so logs are queryable, not just grep-able.
Graceful shutdown
Handle SIGTERM in your application to finish in-flight requests before exiting. Containers that exit immediately on SIGTERM drop in-flight traffic. Most orchestrators send SIGTERM, wait a grace period (default 30s), then SIGKILL.