When Kubernetes Nodes Exceed 1000

TL;DR

  1. When Kubernetes nodes exceed 1000, node exporters deployed as DaemonSets on each node also increase.
  2. When Prometheus Operator performs Service discovery with Service monitor, it references the service’s endpoints by default.
  3. Kubernetes Endpoints objects have a default limit of 1000 IPs.
  4. Only 1000 Prometheus scrape targets are maintained.
  5. Prometheus should use endpointslices instead of Kubernetes endpoints for Service discovery.

As Kubernetes clusters grow and the number of nodes exceeds 1000, various challenges arise. One particularly important issue from a monitoring perspective is Prometheus Service Discovery.

A toy tool for generating Kubernetes manifests in Rust

My personal criticism of Helm is the part where it generates Kubernetes resources by composing YAML with text templates. Actually, the template/values pattern of Helm charts is also used in kube-prometheus’s jsonnet, and there’s a need to abstract and manage complex configurations.

Jsonnet: The Good, the Bad, and the Meh

Each solution is the root of the next problem – Gerald M. Weinberg

I’ve been using Jsonnet for several years, and I think it would be good to summarize my experiences so far.

The good

The best thing about Jsonnet is that it’s a superset of JSON. As a data templating language for generating JSON, it provides many features of programming languages (variables, functions, arithmetic operations, conditionals). Since it can generate JSON, it can also generate YAML, which is why I use it with tanka to create Kubernetes manifests.

kroller : a tiny (restart) tool to help for kubernetes cluster upgrade

Kubernetes upgrades (especially EKS) are categorized into two types based on the Kubernetes architecture:

  • Control plane upgrade (+ etcd)
  • Node upgrade

Particularly when using cloud-managed Kubernetes like EKS, since AWS manages the control plane, you’ll mostly handle node upgrades directly (if you’re not using managed nodegroups).