For a while, I have been running my services in my own kubernetes cluster setup on hetzner. It does the job but overtime, it does occasionally runs into a few random issue that made me want to revisit it:
- A few version upgrades cause the entire cluster to go haywire and I had to manually repair it
- While it’s possible to run a minimal version of the cluster, it won’t be possible to do automatic upgrade.
- Running multiple nodes while keeping the bill to minimal means we run multiple small nodes so the cluster is consistently in warning memory level and any deployments shifts stuff around which is annoying
The biggest problem is the overhead is just too large for a few small things I want to run. I’ve been biting Nix/NixOS on a lot of personal stuff I use, I thought, it’s probably time to also bite the bullet on my internet presence as well.
Installing NixOS
Originally I thought this is going to be a tedious part: manually creating a server, deploying a first-stage config, ssh into it to finish the job then finally you can remotely deploy the new config. To make it worse, hetzner does not have a NixOS snapshot to start with, so you will have to create a NixOS snapshot as a backup so in the future you can recover from there …
Fortunately, that’s not the case, someone has figured that hard part out and it’s called nixos-anywhere. It can SSH into a new server, download NixOS and kexec into it and repartition and format the drive then install itself with your config on it. All in one step!
In practice, you will need to setup a flake with a host config like: (Just an example, configure it however you like!)
{
lib,
pkgs,
...
}: {
imports = [
./disk-config.nix
./hardware-configuration.nix
...
];
disko.devices = {
disk.main = {
type = "disk";
device = "/dev/sda";
content = {
type = "gpt";
partitions = {
boot = {
size = "1M";
type = "EF02";
};
esp = {
size = "512M";
type = "EF00";
content = {
type = "filesystem";
format = "vfat";
mountpoint = "/boot";
mountOptions = [
"umask=0077"
];
};
};
root = {
size = "100%";
content = {
type = "filesystem";
format = "ext4";
extraArgs = [
"-L"
"nixos"
];
mountpoint = "/";
};
};
};
};
};
};
nix.package = lib.mkDefault pkgs.lixPackageSets.stable.lix;
nix.settings = {
experimental-features = [
"nix-command"
"flakes"
];
};
nix.gc = {
automatic = true;
dates = "weekly";
options = "--delete-older-than 14d";
};
nix.optimise = {
automatic = true;
dates = ["weekly"];
};
nixpkgs.config.allowUnfree = true;
boot.loader.efi.canTouchEfiVariables = false;
boot.loader.grub = {
enable = true;
efiSupport = true;
efiInstallAsRemovable = true;
};
boot.kernelParams = ["console=ttyS0,115200n8"];
networking.hostName = "hetzner";
networking.useDHCP = lib.mkDefault true;
networking.firewall = {
allowPing = true;
allowedTCPPorts = [
22
80
443
];
};
services.openssh = {
enable = true;
openFirewall = true;
settings = {
PermitRootLogin = "prohibit-password";
};
};
services.qemuGuest.enable = true;
services.fstrim.enable = true;
security.sudo.wheelNeedsPassword = false;
time.timeZone = "America/Los_Angeles";
i18n.defaultLocale = "en_US.UTF-8";
environment.systemPackages = with pkgs; [
btop
curl
git
vim
wget
];
system.stateVersion = "25.11";
}Then you can run nix run .#nixos-anywhere -- --build-on remote --flake .#hetzner root@<hetzner-ip> to get it to install automatically!
Migrating Services
At this point, your server should be switched over to NixOS and from this point on, you can continue to work on your config and switch it by running nixos-rebuild switch --flake .#hetzner --target-host root@<hetzner-ip> --build-host root@<hetzner-ip>, which means it’s time to migrate the services over.
There are no critical production services on my cluster per se so I just opted for the easiest way to migrate them, shut it down first. Move it over and cut over the domain name. They all pretty much follow the same procedure: scale down deployments, detach disk volume from k8s agent, attach to the new server, copy data to new volume, start service on the new server and finally point the domain name over.
To deploy secrets, I used agenix so I can encrypt the secrets and just track it in the repo.
Using LLM to Migrate Services
I collapsed this section by default as I understand some people does not like the usage of LLM.
I did use
codexto help with this process. I basically gave it access to the cluster via kubectl and ask it to get a lay of the land and to plan for a migration. Then I tell it to do things one-by-one: including writing a script to fetch secrets from kubernetes that pipes into agenix for encryption (which does not pipes into stdout nor get saved, so it should not be in the LLM transcript). I need to manually jump in a couple times to detach/attach disk volumes here. I’m sure I can automate this too but I see this as a good checkpoint anyway.This greatly saved my time between translating myriads of YAML files into Nix.
Final Results
The migration reduces my footprint from 4x CPX21 and various other associated resources totalling ~€55 down to just 1x CPX41, which is overly provisioned. We can probably just run this on a singular CPX21 which would just cost us €12.
The cost was not the driving reason though. This setup is much more easier to maintain and deploy. The old setup required two repos (terraform and k8s yamls) to maintain and the new one is just a singular nix flake to keep track of.
We can probably also migrate some services off container and use lightweight namespace or even just plain systemd unit as well.