Janos Pasztor

Immutable Infrastructure in Practice — Part 2

In the previous part we discussed how to set up a server based on the principle of immutability. Now that we have a server and a way to deploy the Docker package, let’s discuss how we set up our Docker containers.

To Registry or not to Registry

The simplest way to deploy the Docker package would be to use a Docker registry. However, as I’m using an IaaS platform, I don’t have a registry readily available. I could, of course, pay for an external registry, but to keep things simple and reproducible, I opted for a different solution.

In my case I created a docker-compose file for the whole setup. All the files required to build the Docker setup are included in a single directory and are deployed to the server as described in the previous article. Then docker-compose build and docker-compose up is ran.

Traefik

Our first container is going to be Traefik. Traefik is an amazing reverse proxy that supports Docker for routing requests, and also supports LetsEncrypt as a way to generate certificates.

The docker-compose.yaml part looks quite simple:

---
version: '3.2'
services:
  traefik:
    build:
      context: images/traefik
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /srv/acme:/etc/traefik/acme
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - "BACKUP_DEST=s3://${ACME_BUCKET_NAME}"
      - "BACKUP_ENDPOINT=https://sos-${EXOSCALE_REGION}.exo.io"
      - "BACKUP_BUCKETURL=%(bucket)s.sos-${EXOSCALE_REGION}.exo.io"
      - "EXOSCALE_KEY=${EXOSCALE_KEY}"
      - "EXOSCALE_SECRET=${EXOSCALE_SECRET}"
    restart: always

As mentioned in the previous article, the idea here is that the ACME (certificate) file will be backed up every minute, and when the server starts the certificate file will be restored. This is important because LetsEncrypt has rate limits and when working on the infrastructure I would run into this rate limit.

With that in mind the Traefik container is built based on Alpine Linux and a shell script is started instead of Traefik directly. this script contains the restore procedure:

if [ ! -f /etc/traefik/acme/acme.json ]; then
    echo "[default]
host_base = sos-${EXOSCALE_REGION}.exo.io
host_bucket = ${BACKUP_BUCKETURL}
access_key = ${EXOSCALE_KEY}
secret_key = ${EXOSCALE_SECRET}
use_https = True
" > ~/.s3cfg
    s3cmd sync --host=${BACKUP_ENDPOINT} ${BACKUP_DEST} /etc/traefik/acme
    rm ~/.s3cfg
fi
export EXOSCALE_KEY=
export EXOSCALE_SECRET=

Not terribly complicated, it simply copies the acme.json from the object storage to the local filesystem. If the file does not exist on the bucket Traefik will generate a new one and the backup, running in a separate container, will store them on the object storage.

nginx

Now on to nginx. The container itself is pretty simple. In the docker-compose file we add some labels to the container, which Traefik can read and route the appropriate requests to the nginx container:

version: '3.2'
services:
  #...
  nginx:
    build:
      context: images/nginx
    volumes:
      - /srv/www:/var/www
    labels:
      traefik.enable: "true"
      traefik.backend: "nginx"
      traefik.frontend.rule: "Host:${DOMAIN}"
      traefik.port: "80"
      traefik.protocol: "http"
      traefik.frontend.headers.SSLTemporaryRedirect: "false"
      traefik.frontend.headers.SSLRedirect: "true"
      traefik.frontend.headers.STSSeconds: "315360000"
      traefik.frontend.headers.STSIncludeSubdomains: "true"
      traefik.frontend.headers.STSPreload: "true"
      traefik.frontend.headers.forceSTSHeader: "false"
    restart: always

Pulling content

As I mentioned before my website is built in Jekyll, which generates static HTML files. In order to deploy the content of my website I use CircleCI, which runs Jekyll and then copies the content to the object storage.

So I need a way to pull this content to the server. Normally one would use a cronjob to achieve this task, but crond does not like to run in containers, especially not without root privileges. Thankfully there’s superchronic, a rootless cron daemon. As an added bonus Superchronic has a nifty feature that it waits for a cronjob to finish before it starts the next one, so the jobs won’t run in parallel.

After containerizing Superchronic, we can add the crontab file as follows:

#!/bin/bash

set -e

echo "[default]
host_base = sos-${EXOSCALE_REGION}.exo.io
host_bucket = ${PULL_BUCKETURL}
access_key = ${EXOSCALE_KEY}
secret_key = ${EXOSCALE_SECRET}
use_https = True
" > ~/.s3cfg

s3cmd sync --delete-removed --host=${PULL_ENDPOINT} ${PULL_SOURCE} /var/www/htdocs

rm ~/.s3cfg

echo "last_pull $(date +%s)" > /srv/monitoring/last_pull.prom

Most of it is pretty simple. One interesting tidbit in the last line: I’m saving the timestamp of the last pull in a file. This file will later be read by the Prometheus node exporter and added to monitoring. If for whatever reason the pull fails the monitoring system will alert.

Backups

We also need to back up two things: the acme.json file from Traefik, and the metrics from Prometheus. The method is the same as with the content pull, put it in Superchronic and write the timestamp into a metric file.

Prometheus

Prometheus is my choice of monitoring. It’s concept is ingenious: it does one thing and it does it well. Every 15 seconds (or whatever you set it to), it scans a number of HTTP endpoints and downloads the metrics data from these. So in order to do an effective monitoring I need a little bit of extra tooling. In total I need to start a Prometheus container, as well as three metrics collectors called exporters.

The exporters I use are the following:

In addition to that I also use the built-in exporter in Traefik to gather Traefik metrics.

My prometheus.yml file looks as follows:

global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'nginx'
    static_configs:
      - targets: ['nginx-exporter:9113']
  - job_name: 'node'
    static_configs:
      - targets: ['172.28.0.1:9100']
  - job_name: 'docker'
    static_configs:
      - targets: ['cadvisor:8080']
  - job_name: 'traefik'
    static_configs:
      - targets: ['traefik:8080']

Tip: The node exporter needs to run in network mode host, so it won’t be able to participate in the docker-compose network. However, I assign IPs manually, so I can use the host servers IP address in the container network to access its data.

Grafana

The last piece of the puzzle is Grafana. Grafana is my tool of choice for displaying a dashboard and also to send alerts to VictorOps if one of the metrics is out of bounds. My Grafana is public, feel free to take a look.

Since I’m doing immutable infrastructure I have a problem though. If I create my dashboards and metrics by clicking around in the dashboard, I would have to back up my Grafana database and restore it, similar to how I back up Prometheus.

However, I opted for a different solution. Grafana has a feature called provisioning, where the dashboards can be provisioned from files. In other words, I provide the dashboard JSON as a file which Grafana loads on start. When I want to change the settings I click the changes in the dashboard, and then save the JSON in my git repository for reprovisioning with Terraform.

Like before, the Grafana container gets some labels to Traefik can route the requests appropriately:

version: '3.2'
services:
  # ...
  grafana:
    build:
      context: images/grafana
    environment:
      - "GF_SERVER_DOMAIN=monitoring.${DOMAIN}"
      - "GF_SERVER_ROOT_URL=https://monitoring.${DOMAIN}"
      - "GF_SERVER_ENFORCE_DOMAIN=true"
      - "GF_USERS_AUTO_ASSIGN_ORG_ROLE=Admin"
      - "GF_USERS_AUTO_ASSIGN_ORG=true"
      - "GF_AUTH_DISABLE_LOGIN_FORM=true"
      - "GF_AUTH_ANONYMOUS_ENABLED=true"
      - "GF_AUTH_ANONYMOUS_ORG_NAME=Main Org."
      - "GF_AUTH_GITHUB_ENABLED=true"
      - "GF_AUTH_GITHUB_SCOPES=user:email,read:org"
      - "GF_AUTH_GITHUB_AUTH_URL=https://github.com/login/oauth/authorize"
      - "GF_AUTH_GITHUB_TOKEN_URL=https://github.com/login/oauth/access_token"
      - "GF_AUTH_GITHUB_API_URL=https://api.github.com/user"
      - "GF_AUTH_GITHUB_ALLOW_SIGNUP=true"
      - "GF_AUTH_GITHUB_CLIENT_ID=${GITHUB_CLIENT_ID}"
      - "GF_AUTH_GITHUB_CLIENT_SECRET=${GITHUB_CLIENT_SECRET}"
      - "GF_AUTH_GITHUB_ALLOWED_ORGANIZATIONS=opsbears"
      - "GF_SECURITY_SECRET_KEY=${GRAFANA_SECRET_KEY}"
      - "GF_SECURITY_COOKIE_SECURE=true"
      - "GF_SECURITY_COOKIE_SAMESITE=true"
      - "GF_ANALYTICS_REPORTING_ENABLED=false"
      - "BACKUP_DEST=s3://${GRAFANA_BUCKET_NAME}"
      - "BACKUP_ENDPOINT=https://sos-${EXOSCALE_REGION}.exo.io"
      - "BACKUP_BUCKETURL=%(bucket)s.sos-${EXOSCALE_REGION}.exo.io"
      - "EXOSCALE_KEY=${EXOSCALE_KEY}"
      - "EXOSCALE_SECRET=${EXOSCALE_SECRET}"
    volumes:
      - "/srv/grafana:/var/lib/grafana"
    labels:
      traefik.enable: "true"
      traefik.backend: "grafana"
      traefik.frontend.rule: "Host:monitoring.${DOMAIN}"
      traefik.port: "3000"
      traefik.protocol: "http"
      traefik.frontend.headers.SSLTemporaryRedirect: "false"
      traefik.frontend.headers.SSLRedirect: "true"
      traefik.frontend.headers.STSSeconds: "315360000"
      traefik.frontend.headers.STSIncludeSubdomains: "true"
      traefik.frontend.headers.STSPreload: "true"
      traefik.frontend.headers.forceSTSHeader: "false"
      traefik.frontend.redirect.regex: "^https://monitoring.${DOMAIN}/$$"
      traefik.frontend.redirect.replacement: "https://monitoring.${DOMAIN}/d/mXFB_yRWz/home?orgId=1"
      traefik.frontend.redirect.permanent: "true"
    restart: always
    networks:
      internal:
        ipv4_address: 172.28.1.11

Summary

This setup took me a couple of weeks to build. You can definitely go simpler, but as my CDN experiment has showed me, not having monitoring or a well built, tested setup will cost time in the long run. Now I can be confident that my setup is future-proof.

Janos Pasztor

I'm a DevOps engineer with a strong background in both backend development and operations, with a history of hosting and delivering content.

I run an active DevOps and development community on Discord, and if you like what I do and would like me to do more, you can also support me on Patreon.

Support me on

Patreon

Join the community

Discord

Subscribe

Facebook Facebook Twitter Twitter GitHub GitHub
YouTube YouTube RSS Atom Feed
Do you want more? Click the buttons below!