I host my own website and personal projects (like bunnies.io), because it’s fun, and I like being in control of what I’m running. I used to use Nomad for the cluster control plane, and Cloudflare as the CDN, exposing services using Cloudflare Tunnels. Now the cluster uses Kubernetes (via k3s), Cilium for cluster networking, and BunnyCDN. This post details what the new cluster looks like at a high level, and includes discussion about the choices and interesting technologies used.
There are a few drivers behind my choice to reevaluate my setup. Hashicorp’s change of license1 to something less open is quite unfortunate, and coupled with the sale of the company to IBM (and the inevitable focus on enterprise features and licensing), I thought it was a good time to reevaluate my use of Nomad. I knew that I wanted to play with Kubernetes, and had some experience from my work. I also wanted to run my personal workloads with providers more closely aligned with my home jurisdiction, in the European Union. I strongly believe that the EU should invest in building up it’s own technology ecosystem, to provide more options for data sovereignty, and help the bloc become more self-sufficient - and it’s good for me to have first-hand experience with it. It’s also nice to encourage others to think about their own data sovereignty and make more conscious choices about where to host their stuff.
I have too many side-projects to remember every nuance of each of them, especially when switching between them frequently. I am therefore quite strict about having everything be defined using Terraform (“infrastructure as code”), and by repeatable scripts that are designed to make forward-progress when ran, so they’re idempotent. Both of these things mean disaster recovery is reasonably straightforward, from an infra perspective - the whole thing can be shut down and re-applied in to a working state, in a semi-automated way.
I decided it wasn’t worth exploring OpenTofu2 right now, mostly because I didn’t want the project to balloon in complexity any more than it already had - though any future migration should be pretty straightforward.
I was already using Hetzner, and I knew they develop a bunch of Kubernetes-related plugins, including a first-party “cloud controller manager”3 plugin which I could use to get topology labels automatically applied to each of the cluster nodes. Hetzner are also European, and have points of presence in countries near to me in Sweden. They’re also very price-competitive, even as component costs skyrocket - so it was an easy choice to continue using them.
cloud-init scripts install k3s and Cilium when the new server launches, and configure each new node to operate either as a server (control plane) or as an agent (for workloads). Security updates are installed automatically, and I have a script for doing frequent rolling reboots.
Kubernetes is much more widely used in industry than Nomad, and learning how to administrate a cluster could be useful for my work, so this was an easy choice to migrate to.
After some research I discovered k3s4 which had some very appealing properties:
At work I had already discussed a cool cluster networking project called Cilium5, which uses eBPF as a core technology. After a little research it appeared to be a very compelling “drop in” solution for all the cluster networking stuff, and it recently gained support for Kubernetes Gateway APIs.
In my previous cluster I had used ansible, but this time I was reluctant to repeat it, so I spent some time figuring out a way to keep the infrastructure in terraform and then use a Kubernetes-native to manage the cluster. I settled on using Kustomize6 - cluster state can be defined much like Terraform, and then applied (including using server-side applies for state management).
My Kustomize scripts are split in to sections:
So the Kustomize folder structure looks like this:
kustomize
- 00-crds
- gateway-api
- 01-cilium
- cilium
- 10-infra
- cert-manager
- datadog-operator
- hccm
- hcloud-csi
- namespaces
- network-policy
- 15-datadog-agent
- 20-gateway
- 30-apps
- bunnies-io
- lopcode-com
- shared-redis
I was quite fond of the built-in Nomad UI but there’s nothing similar and built-in for Kubernetes. I did however find Headlamp7 which gives a nice visual overview to complement the CLI-based tools, and it’s developed in a Kubernetes SIG (special interest group). I use it quite a lot, these days.
In terms of cluster topology, I’m running 3x control-plane nodes and 3x agent nodes, using a mixture of Hetzner CAX21 and CAX31 types, spread across the fsn1, nbg1 and hel1 regions. Failure in any one of those regions will cause workloads to automatically migrate, and the control-plane nodes can tolerate at least one node being unavailable.
Although most things are self-healing in the setup described above, things can and will go wrong, and knowing when that happens is important. In the future I’d like to explore self-hosting Grafana, but I have quite a lot of experience with Datadog, who provide a generous free tier and a European hosting option, so I decided to stick with it.
It was great to discover that Datadog has excellent Kubernetes support, and allows observation at the node, cluster, and workload levels, all nicely integrated in to one place.
Some simple monitors send me a ping if there are persistent Pod-related issues causing workload failures, if nodes disappear unexpectedly, and other basics.
Hubble is a Cilium tool to visualise network flows. I can’t really do it justice by describing it, but I’ve included a screenshot. It was very easy to enable with a flag when deploying Cilium, and it’s been an extremely powerful tool to help figure out why packets don’t get from one place to another, in either direction (ingress or egress).
BunnyCDN is a CDN provider built in Europe, and has matured a lot over the last few years since I last tried it. It’s also rabbit themed which gives it a huge advantage over other CDN providers.
Each project has its own “pull zone” defined, and bunnies.io gets a special storage-backed zone in the “high volume” pricing tier - that project sometimes pushes around a terabyte of rabbit-related media a month, and the volume tier makes that much more affordable.
An unexpected challenge with BunnyCDN is that they don’t maintain a way to get up-to-date lists of IPs for their infrastructure. I would ideally like to prevent anything from reaching the origin servers without going through BunnyCDN at an IP level, but this doesn’t seem to be technically possible to achieve right now. Instead there’s a simple API-key and the origins return 403 errors if it isn’t present.
I’ve also configured spend limits in the dashboards to prevent abuse from bankrupting me - if something happens, I’d rather the site just went down until I figure out what to do.
My new setup is more “industry standard”, using Kubernetes instead of Nomad, which I hope will make troubleshooting easier, and I’ll be able to jump on cool new features are they’re released - k8s is a very active project, and ecosystem. I’m also pleased I managed to integrate some cutting-edge technologies, using Cilium to replace all of the k8s networking layer stuff - and I hope they continue to improve their support of the new Gateway API. The management of TLS routes and certificates is a little boilerplate-heavy, but good things are coming in the future via ListenerSet support.
If there’s a specific aspect of my new setup that you’re interested in, let me know, and I’ll expand on it in this post.
Hosting my own stuff continues to be fun, and it’s a nice way to learn my way around new technologies without the pressure or consequences involved with my work. For my next project, I’m going to reevaluate how I build my website, hopefully going more custom with a Helidon and JTE-based backend.
Hashicorp change to BUSL license - https://www.hashicorp.com/en/license-faq#aug-10-announcement ↩
OpenTofu - CNCF-backed Terraform fork - https://opentofu.org/ ↩
HCCM - https://github.com/hetznercloud/hcloud-cloud-controller-manager/ ↩
k3s - lightweight kubernetes - https://k3s.io/ ↩
Cilium - eBPF-based cluster networking - https://cilium.io/ ↩
Kustomize - kubernetes native configuration management - https://kustomize.io/ ↩
Headlamp - user-friendly Kubernetes UI - https://headlamp.dev/ ↩