When Kubernetes Nodes Exceed 1000

TL;DR

  1. When Kubernetes nodes exceed 1000, node exporters deployed as DaemonSets on each node also increase.
  2. When Prometheus Operator performs Service discovery with Service monitor, it references the service’s endpoints by default.
  3. Kubernetes Endpoints objects have a default limit of 1000 IPs.
  4. Only 1000 Prometheus scrape targets are maintained.
  5. Prometheus should use endpointslices instead of Kubernetes endpoints for Service discovery.

As Kubernetes clusters grow and the number of nodes exceeds 1000, various challenges arise. One particularly important issue from a monitoring perspective is Prometheus Service Discovery.

GreptimeDB as Prometheus Long-term Storage

Is GreptimeDB suitable as a long-term storage solution for Prometheus? To find an answer to this question, I set up a simple configuration of GreptimeDB (v0.13) to investigate.

What is GreptimeDB?

GreptimeDB is an open-source cloud-native time series database that integrates metrics, logs, and events.

Some notes about cortex architecture

Cortex

Cortex, started by Tom Wilkie and Julius Volz (Prometheus’ co-founder) in 2016, has several interesting architectural features. As one of Prometheus’ long-term storage solutions, Cortex has been referenced by many time-series based storage architectures (tracing, log) since then (especially in the Grafana stack).

Prometheus 101 (slide) and Graphite

Prometheus 101

slide: Prometheus 101

slide: query

slide: range vector

I created a simple presentation about Prometheus. I uploaded it using sporto/hugo-remark: A theme for using remark.js with hugo, and I found that creating it in markdown rather than PowerPoint allowed me to focus more on the content. (But that doesn’t necessarily mean the content is better.)