Kube-State-Metrics Scrape Config: Setup That Actually Works

If your metrics look incomplete, stale, or just wrong: the issue is often not Kubernetes, it is your scrape configuration.

Getting the kube-state-metrics scrape config right is what turns raw cluster data into something Prometheus can actually use. Without it, dashboards break, alerts misfire, and debugging becomes guesswork.

This guide walks you through exactly how to configure Prometheus to scrape Kube-State-Metrics the right way cleanly, reliably, and without hidden pitfalls.

What “Scrape Config” Actually Means (In Simple Terms)

Before jumping into YAML, it helps to understand what is really happening. Prometheus does not “receive” metrics automatically. It pulls them. A scrape config tells Prometheus:

  • where to find metrics (target endpoint)
  • how often to fetch them (interval)
  • how to label them (metadata for querying)

Kube-State-Metrics exposes cluster state at an HTTP endpoint. Prometheus needs a precise configuration to discover and scrape that endpoint consistently.

If you want a deeper understanding of how this data flow works, it helps to see the full pipeline in action in this Kube State Metrics and Prometheus integration guide.

Why Kube-State-Metrics Needs a Proper Scrape Setup

Kube-State-Metrics is different from node exporters or application metrics. It does not measure performance. It exposes cluster state:

  • pod status
  • deployment replicas
  • job completions
  • resource conditions

That means:

  • Data must be fresh → otherwise alerts lag
  • Labels must be clean → otherwise queries break
  • Targets must be stable → otherwise metrics disappear

A weak scrape config leads to:

  • missing metrics
  • duplicate targets
  • incorrect labeling
  • noisy dashboards

Basic Kube State Metrics Scrape Config (Working Example)

Here is a clean and minimal Prometheus scrape config:

scrape_configs:
- job_name: 'kube-state-metrics'
    scrape_interval: 30s
    metrics_path: /metrics

    static_configs:
      - targets:
          - kube-state-metrics.kube-system.svc.cluster.local:8080

What This Does

  • job_name → groups metrics under a logical name
  • scrape_interval → fetches metrics every 30 seconds
  • metrics_path → default endpoint exposed by KSM
  • targets → Kubernetes service endpoint

This works well for simple setups or local clusters. But in real environments, static configs are rarely enough.

Using Kubernetes Service Discovery (Recommended)

Static targets break easily in dynamic clusters. Instead, use Kubernetes service discovery:

scrape_configs:
  - job_name: 'kube-state-metrics'

    kubernetes_sd_configs:
      - role: endpoints

    relabel_configs:
      - source_labels: [__meta_kubernetes_service_name]
        action: keep
        regex: kube-state-metrics

      - source_labels: [__meta_kubernetes_namespace]
        action: keep
        regex: kube-system

Why This Is Better

  • Automatically discovers the service
  • Adapts to pod restarts and scaling
  • Reduces manual updates
  • Keeps config future-proof

What the Relabeling Does

  • Filters only the kube-state-metrics service
  • Restricts scraping to the correct namespace

Without relabeling, Prometheus may scrape unnecessary endpoints.

How to Verify Your Scrape Config Is Working

After applying your Prometheus scrape configuration, do not just assume everything is fine. Always verify it properly. This step helps you catch issues early before they affect your monitoring data.

Step 1: Open Prometheus Targets Page

Start by checking if Prometheus is actually scraping your target. Go to the Prometheus UI and open:

/targets

Now look for the kube-state-metrics job in the list. You should see:

  • Status: UP
  • Last scrape should be recent (a few seconds or minutes ago)

If it shows DOWN, click on it to see the error. Most issues here are usually related to service discovery or network access.

Step 2: Check Metrics Directly

Next, confirm that data is actually coming in. Go to the Graph or Explore section and run a simple query:

kube_pod_info

If everything is working correctly, you will see active time series data appear. If nothing shows up, it usually means:

  • Scraping is not working
  • Or kube-state-metrics is not exposing metrics properly

Step 3: Validate Labels

Now check the quality of your metrics. Inspect the output and look at key labels like:

  • namespace
  • pod
  • container

These labels should be clear and consistent. Why this matters:

  • Clean labels make your dashboards easier to build
  • Helps in filtering and grouping metrics properly
  • Makes debugging much faster later

If labels look messy or missing, your scraping setup might still need tuning.

Common Mistakes That Break Scraping

1. Wrong Service Endpoint

A small DNS mistake breaks everything. Example issue:

  • wrong namespace
  • wrong port
  • wrong service name

Always confirm:

kubectl get svc -n kube-system

2. Missing RBAC Permissions

Prometheus must be allowed to discover endpoints. Without proper RBAC:

  • targets do not appear
  • scraping silently fails

This is especially common in restricted clusters.

3. Overly Broad Discovery

Scraping everything sounds easy but creates chaos. Without filtering:

  • Prometheus scrapes unrelated services
  • metrics become noisy
  • storage usage increases

Always limit targets using relabel configs.

4. Too Frequent Scraping

Lower interval ≠ better monitoring. Example mistake:

scrape_interval: 5s

This causes:

  • unnecessary load
  • duplicate data
  • faster storage consumption

For Kube-State-Metrics: 30s–60s is usually enough

5. Ignoring Metrics Cardinality

Kube-State-Metrics exposes many labels. Too many labels:

  • slow queries
  • heavy Prometheus memory usage

Fix:

  • avoid collecting unused metrics
  • control label usage where possible

Advanced Improvements (When You Need More Control)

Once the basics are solid, you can optimize further.

Add Custom Labels for Better Querying

relabel_configs:
  - target_label: cluster
    replacement: production

This helps when:

  • managing multiple clusters
  • building shared dashboards

Filter Metrics at Scrape Time

If you do not need everything:

metric_relabel_configs:
  - source_labels: [__name__]
    regex: 'kube_pod_.*'
    action: keep

This keeps only pod-related metrics. Result:

  • cleaner data
  • faster queries
  • lower storage cost

Separate Jobs for Clarity

Instead of mixing targets:

job_name: 'kube-state-metrics'

Keep it isolated. This makes:

  • debugging easier
  • dashboards cleaner
  • alerts more predictable

How Kube-State-Metrics Fits Into Your Monitoring Flow

Kube-state-metrics is like a translator between Kubernetes and Prometheus. Understanding the full flow prevents misconfiguration.

1. Kubernetes keeps updating the status of things like pods and deployments. But Prometheus cannot directly read that information.

2. Kube-State-Metrics takes that data and turns it into simple metrics that Prometheus can understand.

3. Prometheus scrapes the endpoint to collect the data

4. Data is stored and queried

5. Dashboards and alerts use that data

If something goes wrong at step 3, everything below it stops working properly. You might still see Prometheus running, but your dashboards can show empty or incomplete data.

When Static Config Is Still Fine

Static config is fine when your setup is simple and does not change much.

  • It works well for local clusters, Docker-based setups, or testing environments where you just want things to run quickly.
  • It is easy because you manually tell Prometheus what to scrape. But in real production systems, things change often, and static config becomes harder to manage.

That is why most production setups use service discovery instead.

Quick Checklist (Before You Move On)

Before you finish your setup, quickly double-check a few things.

  • uses correct service endpoint
  • has RBAC configured
  • filters targets properly
  • uses reasonable scrape interval (not too fast or too slow)
  • avoids unnecessary metrics

If all of this looks good, your setup is healthy and ready to use.

Conclusion

A good kube state metrics scrape config is not complicated but it has to be intentional. Most monitoring issues do not come from Kubernetes itself. They come from weak or incomplete scraping setups.

Get the basics right, keep the config clean, and your metrics will stay reliable, predictable, and useful.

FAQ Section

1. What port does Kube-State-Metrics expose?

By default, it exposes metrics on port 8080 at the /metrics endpoint.

2. Do I need service discovery for Kube-State-Metrics?

Not strictly. But in dynamic Kubernetes environments, service discovery is strongly recommended for stability.

3. How often should Prometheus scrape Kube-State-Metrics?

A 30–60 second interval is usually sufficient. Faster scraping rarely adds value.

4. Why are my Kube-State-Metrics targets showing DOWN?

Common causes are:
1. wrong service endpoint
2. missing RBAC permissions
3. incorrect namespace filtering

5. Can I reduce the number of metrics scraped?

Yes. Use metric_relabel_configs to filter out unnecessary metrics and reduce load.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top