TL;DR
- When Kubernetes nodes exceed 1000, node exporters deployed as DaemonSets on each node also increase.
- When Prometheus Operator performs Service discovery with Service monitor, it references the service’s endpoints by default.
- Kubernetes Endpoints objects have a default limit of 1000 IPs.
- Only 1000 Prometheus scrape targets are maintained.
- Prometheus should use endpointslices instead of Kubernetes endpoints for Service discovery.
As Kubernetes clusters grow and the number of nodes exceeds 1000, various challenges arise. One particularly important issue from a monitoring perspective is Prometheus Service Discovery.