Devops on Project Wintermute

k3s on Hetzner: notes from running production clusters

Thu, 05 Mar 2026 14:00:00 +0200

TL;DR. k3s on Hetzner is a strong cost-control move when you are willing to operate the cluster. Mind the Flannel MTU on Hetzner private networks, separate stateless and stateful workloads at the storage layer, keep observability minimal but real, and treat backups as a tested practice rather than a config setting.

A managed Kubernetes service is the right answer for most teams. When it is not the right answer (cost, control, locality of data), self-hosted k3s on a low-cost provider like Hetzner is one of the better options. We have run several clusters of this shape in production for over a year. This post is the set of decisions that have held up.

Speeding up GitHub Actions lint pipelines for large Go codebases

Thu, 12 Feb 2026 10:00:00 +0200

TL;DR. Lint on a large Go monorepo went from 63 seconds to about 25 seconds on warm cache, with macOS skipped on branches. Five changes: concurrency group, conditional OS matrix, combined cache restore and save, explicit go mod download, and incremental golangci-lint --new-from-rev. None require a self-hosted runner.

A large Go codebase makes the CI lint stage the part developers feel most: every push, on every branch. Lint feedback that takes a minute and a half kills iteration speed and quietly trains people to push less often, which is the opposite of what you want.

Anatomy of a 6-hour Kubernetes ingress outage

Mon, 09 Feb 2026 12:00:00 +0200

TL;DR. A backend deployment lost all healthy pods. nginx active health checks marked the upstream pool empty. That pool was the default_server for ports 80 and 443, so every unmatched hostname returned 502 for 6 hours and 38 minutes. The trigger was Kubernetes-side. The blast radius was a configuration choice we made years ago. The post-incident fixes were almost all on the nginx side.

We had a P0 outage on a public ingress tier. Two redundant nginx instances, both showing the same symptoms, both serving production traffic to dozens of hostnames. This is the writeup, sanitised and reduced to the parts that generalise.

Services

Sun, 13 Feb 2022 11:11:11 -0100

Below is what we actually do day to day. We try to keep the list short and the descriptions honest. If something here matches what you need, get in touch.

Software engineering ¶

We write backend services in Java, Kotlin, Go, and Groovy. Most of the work falls into a few buckets: REST and gRPC APIs, distributed systems handling high request volumes, and the occasional batch job that has to be reliable more than fast.