Skip to content

Crack SDE

Most of the content are generated by AI, with human being reviewed, edited, and revised

Menu
  • Home
  • Daily English Story
  • Tech Interviews
  • Cloud Native
  • DevOps
  • Artificial Intelligence
Menu

How to collect metrics of container, pods, node and cluster in k8s?

Posted on 04/14/202504/23/2025 by user
Below is a detailed overview of how to collect metrics at different layers in a Kubernetes cluster—covering container-level, pod-level, node-level, and overall cluster-level metrics. We’ll focus on the most common, open-source approaches, although there are many commercial or cloud-specific variants that work similarly.

1. Collecting Container and Pod Metrics

cAdvisor (Container Advisor)

  • What It Is: A daemon that collects resource usage and performance characteristics of running containers.
  • Where It Runs: Typically embedded inside the Kubernetes kubelet process on each node.
  • Metrics Collected: CPU usage, memory usage, network I/O, filesystem I/O per container and pod.
  • Accessing Metrics: Exposed on the kubelet endpoint (e.g., http://<node-ip>:10255/metrics/cadvisor or https://<node-ip>:10250/metrics/cadvisor if secure).

Kubernetes uses cAdvisor under the hood, so you usually don’t install it separately—it’s already built into the kubelet. These cAdvisor metrics are then scraped by a metrics collector (e.g., Prometheus).

Prometheus Scraping

  • Prometheus is commonly used to scrape cAdvisor metrics.
  • How It Works:
    1. You install the Prometheus Operator or a standalone Prometheus instance in the cluster.
    2. You configure Prometheus to scrape the kubelet’s cAdvisor metrics endpoint (and other endpoints).
  • Collected Data: CPU, memory, disk, network usage for containers/pods.

Metrics Server (For HPA)

  • What It Is: A lightweight, cluster-wide aggregator of resource usage data.
  • Primary Use Case: Used by Kubernetes’ Horizontal Pod Autoscaler (HPA) to scale workloads based on CPU/memory usage.
  • Data Source: Fetches metrics from Kubelets/cAdvisor, then makes them available via the metrics.k8s.io API.
  • Limitations: Designed for autoscaling, not for long-term storage or advanced analytics.

2. Collecting Node Metrics

Kubelet’s /metrics Endpoint

  • What It Is: The kubelet itself exposes node metrics (e.g., CPU/memory usage of the node, runtime stats).
  • Where to Find:
    • http://<node-ip>:10255/metrics (insecure endpoint)
    • https://<node-ip>:10250/metrics (secure endpoint)
  • Collected Data: Node-wide CPU usage, memory usage, runtime container stats (via cAdvisor integration).

Node Exporter (Prometheus)

  • What It Is: A Prometheus exporter that collects Linux system-level metrics.
  • How to Deploy: Typically deployed as a DaemonSet so that every node runs a Node Exporter container.
  • Collected Data: CPU, memory, disk usage, file system stats, network, etc., at the node level.
  • Scraping: Prometheus scrapes the Node Exporter endpoints, adding those metrics to the time-series database.

3. Collecting Cluster Metrics & State

kube-state-metrics (KSM)

  • What It Is: A component that listens to the Kubernetes API and generates metrics about cluster objects.
  • Examples of Metrics:
    • Number of desired/available replicas in Deployments, DaemonSets, StatefulSets
    • Pod status, job status, node status
    • Resource quotas, limits, requests
  • How to Deploy: Install via Helm chart or YAML manifest. Usually deployed as a single Deployment, which listens to the API server.
  • Scraping: Prometheus scrapes /metrics endpoint of kube-state-metrics to retrieve cluster-level metrics.

Control Plane Metrics (API Server, Scheduler, Controller Manager)

  • API Server: Exposes metrics on :6443/metrics (secure port).
  • Scheduler: Exposes metrics on a separate port (often :10251/metrics).
  • Controller Manager: Exposes metrics on another port (often :10252/metrics).
  • Scraping: Configure Prometheus to scrape these endpoints, often requiring RBAC and service discovery settings to allow secure scraping.

Metric Flow

Node/Pod/Container → cAdvisor/kubelet
↓
Prometheus <—> kube-state-metrics + node-exporter
↓
Grafana Dashboard

4. Putting It All Together with Prometheus

A common and recommended way to gather Kubernetes metrics at all levels (containers, pods, nodes, and cluster objects) is:

  1. Prometheus Operator:
    • Manages Prometheus, Alertmanager, and other CRDs (ServiceMonitor, PodMonitor).
    • Automatically discovers Kubernetes services (including kubelet, cAdvisor, kube-state-metrics, Node Exporter) based on labels or annotations.
  2. Components to Install:
    • Prometheus (for scraping all metrics).
    • Node Exporter (usually as a DaemonSet).
    • kube-state-metrics (as a Deployment).
    • (Optional) Metrics Server (for HPA functionality).
  3. Scrape Configurations:
    • ServiceMonitor and PodMonitor CRDs tell Prometheus which endpoints to scrape and on which ports.
    • For example, a ServiceMonitor might point to kubelet pods’ 10250 port for cAdvisor data.
  4. Storage & Retention:
    • Prometheus has an internal time-series database.
    • For longer-term storage or large-scale clusters, use Thanos, Cortex, or Mimir to extend Prometheus’ capabilities.

5. Visualization & Dashboards

Once you have Prometheus collecting container, pod, node, and cluster metrics, you can visualize them:

  1. Grafana:
    • Very common with Prometheus.
    • Community dashboards for Kubernetes out of the box (includes cluster overview, node metrics, pod resource usage, etc.).
    • Additional dashboards available for kube-state-metrics, cAdvisor, Node Exporter, etc.
  2. Splunk Observability, Elastic Stack, Datadog, etc.:
    • You can forward Prometheus data (via OpenTelemetry Collector or Prometheus Remote Write) to these platforms.
    • Each platform typically provides dashboards and alerting for Kubernetes metrics.

6. Example Deployment Steps (Prometheus Stack)

Here’s a simplified example workflow using Helm:

  1. Add Repo & Install Prometheus Stack: helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # Install the kube-prometheus-stack (includes Prometheus, Alertmanager, Grafana, Node Exporter, kube-state-metrics, etc.) helm install my-prom-stack prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
  2. Confirm Pods: kubectl get pods -n monitoring You should see pods like:
    • Prometheus server
    • Alertmanager
    • Node Exporter (DaemonSet)
    • kube-state-metrics
    • Grafana
  3. Access Grafana:
    • By default, the Helm chart creates a Service for Grafana.
    • You can port-forward to it and log in, or expose it through an Ingress.
    kubectl port-forward svc/my-prom-stack-grafana 3000:80 -n monitoring Then open http://localhost:3000.
  4. Dashboards:
    • Grafana has built-in “Kubernetes / Compute Resources” dashboards when using the kube-prometheus-stack.
    • You can also import community dashboards from Grafana.com.

7. Additional Best Practices

  1. RBAC & Security:
    • Secure access to kubelet metrics (/metrics/cadvisor).
    • Use SSL/TLS if needed, along with appropriate certificates.
    • Restrict who can query your metrics endpoints.
  2. Limit Over-Collection:
    • High-frequency scraping can lead to large data volumes and performance overhead.
    • Consider adjusting scrape intervals or sampling strategies.
  3. Resource Requests & Limits:
    • Ensure the Prometheus server has enough CPU/memory to handle the ingestion load.
    • Tune retention times, storage volume, and potential remote write solutions.
  4. High Availability:
    • Run multiple Prometheus replicas if you need HA.
    • Tools like Thanos or Cortex can replicate data across multiple Prometheus instances.
  5. Extend with Logs & Tracing:
    • For a full observability stack, add log aggregation (e.g., Fluentd, Loki, Splunk) and distributed tracing (e.g., Jaeger, OpenTelemetry).
    • This helps correlate metrics with logs and traces for faster root cause analysis.

In Summary

  • cAdvisor (built into kubelet) collects container-level CPU/memory metrics.
  • Node Exporter provides OS-level metrics from each node.
  • kube-state-metrics exposes cluster resource and object metrics.
  • Metrics Server is essential for the Horizontal Pod Autoscaler.
  • Prometheus (scraping) + Grafana (dashboards) is the most common open-source solution.

By installing these components (often packaged together with the Prometheus Operator or the kube-prometheus-stack Helm chart), you’ll have a comprehensive view of container, pod, node, and cluster-level metrics in Kubernetes, all accessible for visualization and alerting.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on X (Opens in new window) X

Related

Recent Posts

  • LC#622 Design Circular Queue
  • Started with OpenTelemetry in Go
  • How Prometheus scrap works, and how to find the target node and get the metrics files
  • How to collect metrics of container, pods, node and cluster in k8s?
  • LC#200 island problem

Recent Comments

  1. another user on A Journey of Resilience

Archives

  • May 2025
  • April 2025
  • February 2025
  • July 2024
  • April 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • June 2023
  • May 2023

Categories

  • Artificial Intelligence
  • Cloud Computing
  • Cloud Native
  • Daily English Story
  • Database
  • DevOps
  • Golang
  • Java
  • Leetcode
  • Startups
  • Tech Interviews
©2025 Crack SDE | Design: Newspaperly WordPress Theme
Manage Cookie Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}