The mysterious disapperance of Docker images
A node hosts a Gitlab runner and a small k3s cluster which runs a few services as regular kubernetes deployments. A CI job pinned to that runner builds Docker images for these services services, updates the image of the corresponding deployments, and starts a few system and acceptance tests. The CI job does not push those images to the in-house registry; to avoid polluting the registry with hundreds of images it just builds locally.
Each test then scales each deployment to zero replicas to effectively stop all services, clears the system’s underlying database, and scales the service deployments back to a small number of replicas sufficient for testing.
The whole thing runs fine until one day the replicas randomly fail to start.