From Kubernetes to a Self-Healing, Low-Cost Infrastructure

From Kubernetes to a Self-Healing, Low-Cost Infrastructure

Tópico: From Kubernetes to a Self-Healing, Low-Cost Infrastructure
Categoria: Tutoriais | Programação & Tecnologia
Idioma Principal: Português (Conteúdo de Tecnologia)

Descrição do Conteúdo / Informações:
-------------------------------------------------------------------------
I've been running a background project on Kubernetes for a while now. It's not a project that needs 100% uptime, and neither is it one I wanted to spend a lot of time managing or even checking up on, so Kubernetes with Spot VM's seemed the most cost effective solution, and it's been solid and trouble free. However running a couple of pre-emptible nodes with a managed ingress was still costing $150 a month or so. Way too much for a hobby project.

This article is about retaining the self healing capability you get with Kubernets, but migrating to a much more cost effective (about $40 a month) approach. I found that without Kubernetes in the the mix, i could get away with a single VM, but of course that doesn't give me recovery from pre-emption, so here's how to get that too.

Managed instance group

The transition from a high-cost Google Kubernetes Engine (GKE) cluster to a single, highly available Spot VM managed by a Stateful Managed Instance Group (MIG) offers a path to significant cost savings without sacrificing resilience. By leveraging Docker Compose and automated infrastructure orchestration, the platform—comprising microservices such as GraphQL gateways, Elasticsearch processors, and Redis queues—now operates at the lowest possible compute cost while maintaining full recovery capabilities.

Decoupling Compute from State

The core challenge with using Spot VMs is their preemptible nature. To make this architecture immune to data loss during preemption, it utilizes GCP Stateful MIG Policies alongside stable device-id path targeting.

A critical component of this decoupling is Storage State Preservation. Standard attachment targets (like /dev/sdb) are prone to swapping order during VM initialization. To guarantee consistency across machine replacements, the persistent volume is targeted via its unchangeable physical serial header: /dev/disk/by-id/google-existing-data.

The Autonomous Recovery Workflow

When Google Cloud preempts a Spot VM, the system triggers a fully automated self-healing pipeline:

• Detection & Provisioning: The MIG detects the deletion and instantly provisions a fresh instance node to maintain the target capacity of one.

• Stateful Attachment: The MIG automatically binds the regional static IP and hot-plugs the persistent block storage to the new node at boot time.

• Guest OS Bootstrapping: A custom startup-script.sh holds execution until hardware attachment is verified. It then mounts the filesystem, installs the Docker Engine, and restarts microservices seamlessly using ./start-all.sh.

This entire process typically brings the platform back online with zero manual intervention within 60–90 seconds.

Operational States: Preemption vs. Scaling Down

It is vital to distinguish between a True Spot Preemption and a Manual Scale Down to 0.

• Spot Preemption: The MIG's intent is to keep one machine online. Per-instance configurations are preserved, and the recovery is fully autonomous.

• Scale Down to 0: This decommission command destroys the unique stateful metadata ties. When scaling back to 1, the new VM will get stuck in a boot loop because the MIG no longer knows to attach the existing disk or IP. Recovery in this scenario requires a manual orchestration script, ./create-mig.sh, to re-bind the regional static IP and existing data disk.

Comparing Infrastructure Models

Aspect
Kubernetes (GKE)
Standalone VM
Stateful MIG

Compute Cost
High (Cluster + Nodes)
Medium (Standard VM)
Lowest (Spot VM)

Preemption Recovery
Automatic
Manual Recreate
Fully Automatic

Volume Mounts
PVCs with GCE PD
Local Static /dev/sdb
Stable by-id path

IP Persistence
K8s LoadBalancer
Bound to Instance
Preserved via Config

Docker-compose benefit

Previously I was using kubernetes, cloud build and the artifact registry to manage my builds and releases. This meant that testing was a bit awkward, involving minikube, ngrok and various other workaraounds. Now that I've transitioned to docker compose, the exact same scripts and yaml files work both locally on my mac and on my vm, so i have a complete end to end simulation locally.

Replacing the Kubernetes Ingress

If you need to access the VM externally, you're going to need to create some kind of ingress. Under GKE I was using a managed ingress, with letsencrypt handling the ssl certificate. On our vanilla VM, we can use a traefix proxy. All my services run on docker, as does the traefik proxy. Here's how to set it up.

start-traefik.sh

#!/bin/bash

GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m'
if [[ "$(uname)" == "Darwin" ]]; then
echo "Skipping Traefik on Mac (not needed)"
exit 0
fi

echo -e "${GREEN}🚀 Starting Traefik...${NC}"

# Ensure network exists
docker network create fid-network 2>/dev/null

# Start Traefik
docker compose -f docker-compose-traefik.yml up -d

sleep 3

if curl -s -o /dev/null -w "%{http_code}" http://localhost:80 | grep -q "200\|301\|302"; then
echo -e "${GREEN}✅ Traefik is running${NC}"
echo ""
echo "Traefik is listening on ports 80 (HTTP) and 443 (HTTPS)."
echo "Check logs: docker compose -f docker-compose-traefik.yml logs -f traefik"
else
echo -e "${RED}❌ Traefik may not be ready. Check logs.${NC}"
fi

docker-compose-traefik.yml

services:
traefik:
image: traefik:v3.2
container_name: fid-traefik
command:
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--providers.file.directory=/etc/traefik/conf"
- "--certificatesresolvers.letsencrypt.acme.httpchallenge=true"
- "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
- "--certificatesresolvers.letsencrypt.acme.email=admin@xliberation.com"
- "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
- "--entrypoints.web.http.redirections.entrypoint.to=websecure"
- "--entrypoints.web.http.redirections.entrypoint.scheme=https"
- "--entrypoints.web.http.redirections.entrypoint.permanent=true"
ports:
- "80:80"
- "443:443"
volumes:
- ./conf:/etc/traefik/conf
- traefik_certs:/letsencrypt
networks:
- fid-network
restart: unless-stopped

volumes:
traefik_certs:

networks:
fid-network:
external: true

Some example scripts

All of this is a little tricky and precise, so here are some scripts I have used to get my services running, along with a few hints. I'll assume you already have a reserved static address (if you need to expose publicly)

Your startup script (startup-script.sh)

This is mine - note the connection string that uses the by-id path. google-exisiting-data refers to the persistent disk it should attach. This is not the disk name (in my case that name is fid-data), but a standard name that a mig applies to an incoming state fule disk attachment. Note the subsequent mount command that mounts the disk as its correct name. The systemd platform file describes what to do once all that is complete. I have a start-all.sh there that will actually start all my services using the persistent disk fid-data.

#!/bin/bash
set -e

# --- 1. Wait for persistent disk to physically attach ---
echo "Waiting for persistent disk to attach..."
while [ ! -e /dev/disk/by-id/google-existing-data ]; do
sleep 1
done

# Give the storage driver a brief moment to stabilize the block mapping
sleep 2

# --- 2. Mount persistent disk ---
mkdir -p /mnt/disks/fid-data
if ! mountpoint -q /mnt/disks/fid-data; then
mount -o discard,defaults /dev/disk/by-id/google-existing-data /mnt/disks/fid-data
fi

# --- 3. Create symlink to repo on persistent disk ---
mkdir -p /home/brucemcpherson
chown -R brucemcpherson:brucemcpherson /home/brucemcpherson
ln -sf /mnt/disks/fid-data/fidmaster /home/brucemcpherson/fidmaster

# --- 4. Install Official Modern Docker Engine & Compose ---
apt-get update -qq
apt-get install -y -qq ca-certificates curl gnupg

# Add Docker's official GPG key and repository
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor --yes -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg

echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install the exact modern packages (restores "docker compose" with a space)
apt-get update -qq
apt-get install -y -qq docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# --- 5. Configure Docker data root ---
if ! grep -q '"data-root": "/mnt/disks/fid-data/docker"' /etc/docker/daemon.json 2>/dev/null; then
echo '{"data-root": "/mnt/disks/fid-data/docker"}' | tee /etc/docker/daemon.json
systemctl restart docker
fi

# --- 6. Create systemd service file ---
# Using 'tee' inside the script ensures no root permission redirection blocks
cat << 'EOF' | tee /etc/systemd/system/fid-platform.service > /dev/null
[Unit]
Description=FID Platform (all services)
After=docker.service network.target
Requires=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
User=brucemcpherson
WorkingDirectory=/home/brucemcpherson/fidmaster/vm-docker/local-compose
ExecStartPre=/bin/sleep 5
ExecStart=/bin/bash /home/brucemcpherson/fidmaster/vm-docker/local-compose/start-all.sh
ExecStop=/bin/bash /home/brucemcpherson/fidmaster/vm-docker/local-compose/stop-all.sh
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable fid-platform

# --- 7. Start all services ---
cd /home/brucemcpherson/fidmaster/vm-docker/local-compose
./start-all.sh

echo "Startup complete."

Create a managed group template

Note that the template references this startup-script. If you subsequently change the startup-script, you'll need to replace the template.

gcloud compute instance-templates create fid-vm-template-ephemeral \
--region=europe-west2 \
--machine-type=e2-standard-4 \
--image-family=debian-12 \
--image-project=debian-cloud \
--boot-disk-size=50GB \
--boot-disk-type=pd-ssd \
--provisioning-model=SPOT \
--metadata-from-file startup-script=startup-script.shgcloud

Create a managed group (create-mig.sh)

The size=1 means i want a single VM. This VM will be called your_mig_name-random_chars.

#!/bin/bash

# 1. Define your known environment variables
PROJECT_ID="your project"
REGION="europe-west2"
ZONE="europe-west2-b"
MIG_NAME="the name for your new mig"
IP_NAME="your static ip name"
DISK_NAME="your persistent disk name"

# Check if the MIG already exists before trying to create it
if gcloud compute instance-groups managed describe "$MIG_NAME" --zone="$ZONE" >/dev/null 2>&1; then
echo "Managed Instance Group '$MIG_NAME' already exists. Skipping creation..."
else
echo "Creating Managed Instance Group '$MIG_NAME'..."
gcloud compute instance-groups managed create "$MIG_NAME" \
--zone="$ZONE" \
--template=fid-vm-template-ephemeral \
--size=1

# Give GCP a moment to spin up the instance before querying it
echo "Waiting 15 seconds for instance to initialize..."
sleep 15
fi

# 2. Automatically grab the dynamic instance name
INSTANCE_NAME=$(gcloud compute instances list --filter="name~'^fid-mig-'" --format="value(name)")

# Safety check: Ensure an instance actually exists
if [ -z "$INSTANCE_NAME" ]; then
echo "Error: No running instance found starting with '$MIG_NAME-'."
exit 1
fi

echo "Found target instance: $INSTANCE_NAME"

# 3. Handle the per-instance config cleanly (Create if missing, update if exists)
echo "Configuring stateful IP and Disk resources..."

if gcloud compute instance-groups managed instance-configs describe "$MIG_NAME" --instance="$INSTANCE_NAME" --zone="$ZONE" >/dev/null 2>&1; then
CONFIG_ACTION="update"
else
CONFIG_ACTION="create"
fi

gcloud compute instance-groups managed instance-configs $CONFIG_ACTION "$MIG_NAME" \
--zone="$ZONE" \
--instance="$INSTANCE_NAME" \
--stateful-external-ip interface-name=nic0,address=projects/"$PROJECT_ID"/regions/"$REGION"/addresses/"$IP_NAME" \
--stateful-disk=device-name=existing-data,source=projects/"$PROJECT_ID"/zones/"$ZONE"/disks/"$DISK_NAME",auto-delete=never

# 4. Trigger the MIG to apply these stateful settings to the live VM
echo "Applying configurations to $INSTANCE_NAME..."
gcloud compute instance-groups managed update-instances "$MIG_NAME" \
--zone="$ZONE" \
--instances="$INSTANCE_NAME"

echo "Done! Dynamic setup complete."#!/bin/bash

Ssh to your instance (ssh.sh)

My mig group is called fid-mig, so i can extract the instance name and attach to it like this.

gcloud compute ssh $(gcloud compute instances list --filter="name~'^fid-mig-'" --format="value(name)") --zone=europe-west2-b

Simulate a pre-emption to test (simulate-preemption.sh)

In this case, just deleting the instance will simulate a preemption. It will come back up and execute your startup without needing intervention. This differs from the deliberate resizing the MIG to 0 which would need manual intervention to restart the VM.

# 1. Find the current instance name
INSTANCE_NAME=$(gcloud compute instances list --filter="name~'^fid-mig-'" --format="value(name)")

# 2. Simulate a sudden crash/preemption by deleting the instance body directly
gcloud compute instances delete $INSTANCE_NAME --zone=europe-west2-b --quiet

gcloud compute instance-groups managed list

Take down the VM (down-mig.sh)

If you're not using it, might as well save the cost. All you have to do is set the Mig size to 0.

gcloud compute instance-groups managed resize fid-mig \
--size=0 \
--zone=europe-west2-b \
--quiet

gcloud compute instance-groups managed list

Bring up the VM (up-mig.sh)

Setting the size to 1, will reinstate the vm. However at this point it knows nothing about what it's supposed to do, so you also need to run create-mig.sh to get back to a running system.

gcloud compute instance-groups managed resize fid-mig \
--size=1 \
--zone=europe-west2-b \
--quiet

gcloud compute instance-groups managed list

links

article

video