If your metrics look incomplete, stale, or just wrong: the issue is often not Kubernetes, it is your scrape configuration.
Getting the kube-state-metrics scrape config right is what turns raw cluster data into something Prometheus can actually use. Without it, dashboards break, alerts misfire, and debugging becomes guesswork.
This guide walks you through exactly how to configure Prometheus to scrape Kube-State-Metrics the right way cleanly, reliably, and without hidden pitfalls.
What “Scrape Config” Actually Means (In Simple Terms)
Before jumping into YAML, it helps to understand what is really happening. Prometheus does not “receive” metrics automatically. It pulls them. A scrape config tells Prometheus:
- where to find metrics (target endpoint)
- how often to fetch them (interval)
- how to label them (metadata for querying)
Kube-State-Metrics exposes cluster state at an HTTP endpoint. Prometheus needs a precise configuration to discover and scrape that endpoint consistently.
If you want a deeper understanding of how this data flow works, it helps to see the full pipeline in action in this Kube State Metrics and Prometheus integration guide.
Why Kube-State-Metrics Needs a Proper Scrape Setup
Kube-State-Metrics is different from node exporters or application metrics. It does not measure performance. It exposes cluster state:
- pod status
- deployment replicas
- job completions
- resource conditions
That means:
- Data must be fresh → otherwise alerts lag
- Labels must be clean → otherwise queries break
- Targets must be stable → otherwise metrics disappear
A weak scrape config leads to:
- missing metrics
- duplicate targets
- incorrect labeling
- noisy dashboards
Basic Kube State Metrics Scrape Config (Working Example)
Here is a clean and minimal Prometheus scrape config:
scrape_configs:
- job_name: 'kube-state-metrics'
scrape_interval: 30s
metrics_path: /metrics
static_configs:
- targets:
- kube-state-metrics.kube-system.svc.cluster.local:8080
What This Does
job_name→ groups metrics under a logical namescrape_interval→ fetches metrics every 30 secondsmetrics_path→ default endpoint exposed by KSMtargets→ Kubernetes service endpoint
This works well for simple setups or local clusters. But in real environments, static configs are rarely enough.
Using Kubernetes Service Discovery (Recommended)
Static targets break easily in dynamic clusters. Instead, use Kubernetes service discovery:
scrape_configs:
- job_name: 'kube-state-metrics'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
action: keep
regex: kube-state-metrics
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: kube-system
Why This Is Better
- Automatically discovers the service
- Adapts to pod restarts and scaling
- Reduces manual updates
- Keeps config future-proof
What the Relabeling Does
- Filters only the kube-state-metrics service
- Restricts scraping to the correct namespace
Without relabeling, Prometheus may scrape unnecessary endpoints.
How to Verify Your Scrape Config Is Working
After applying your Prometheus scrape configuration, do not just assume everything is fine. Always verify it properly. This step helps you catch issues early before they affect your monitoring data.
Step 1: Open Prometheus Targets Page
Start by checking if Prometheus is actually scraping your target. Go to the Prometheus UI and open:
/targets
Now look for the kube-state-metrics job in the list. You should see:
- Status: UP
- Last scrape should be recent (a few seconds or minutes ago)
If it shows DOWN, click on it to see the error. Most issues here are usually related to service discovery or network access.
Step 2: Check Metrics Directly
Next, confirm that data is actually coming in. Go to the Graph or Explore section and run a simple query:
kube_pod_info
If everything is working correctly, you will see active time series data appear. If nothing shows up, it usually means:
- Scraping is not working
- Or kube-state-metrics is not exposing metrics properly
Step 3: Validate Labels
Now check the quality of your metrics. Inspect the output and look at key labels like:
namespacepodcontainer
These labels should be clear and consistent. Why this matters:
- Clean labels make your dashboards easier to build
- Helps in filtering and grouping metrics properly
- Makes debugging much faster later
If labels look messy or missing, your scraping setup might still need tuning.
Common Mistakes That Break Scraping
1. Wrong Service Endpoint
A small DNS mistake breaks everything. Example issue:
- wrong namespace
- wrong port
- wrong service name
Always confirm:
kubectl get svc -n kube-system
2. Missing RBAC Permissions
Prometheus must be allowed to discover endpoints. Without proper RBAC:
- targets do not appear
- scraping silently fails
This is especially common in restricted clusters.
3. Overly Broad Discovery
Scraping everything sounds easy but creates chaos. Without filtering:
- Prometheus scrapes unrelated services
- metrics become noisy
- storage usage increases
Always limit targets using relabel configs.
4. Too Frequent Scraping
Lower interval ≠ better monitoring. Example mistake:
scrape_interval: 5s
This causes:
- unnecessary load
- duplicate data
- faster storage consumption
For Kube-State-Metrics: 30s–60s is usually enough
5. Ignoring Metrics Cardinality
Kube-State-Metrics exposes many labels. Too many labels:
- slow queries
- heavy Prometheus memory usage
Fix:
- avoid collecting unused metrics
- control label usage where possible
Advanced Improvements (When You Need More Control)
Once the basics are solid, you can optimize further.
Add Custom Labels for Better Querying
relabel_configs:
- target_label: cluster
replacement: production
This helps when:
- managing multiple clusters
- building shared dashboards
Filter Metrics at Scrape Time
If you do not need everything:
metric_relabel_configs:
- source_labels: [__name__]
regex: 'kube_pod_.*'
action: keep
This keeps only pod-related metrics. Result:
- cleaner data
- faster queries
- lower storage cost
Separate Jobs for Clarity
Instead of mixing targets:
job_name: 'kube-state-metrics'
Keep it isolated. This makes:
- debugging easier
- dashboards cleaner
- alerts more predictable
How Kube-State-Metrics Fits Into Your Monitoring Flow
Kube-state-metrics is like a translator between Kubernetes and Prometheus. Understanding the full flow prevents misconfiguration.
1. Kubernetes keeps updating the status of things like pods and deployments. But Prometheus cannot directly read that information.
2. Kube-State-Metrics takes that data and turns it into simple metrics that Prometheus can understand.
3. Prometheus scrapes the endpoint to collect the data
4. Data is stored and queried
5. Dashboards and alerts use that data
If something goes wrong at step 3, everything below it stops working properly. You might still see Prometheus running, but your dashboards can show empty or incomplete data.
When Static Config Is Still Fine
Static config is fine when your setup is simple and does not change much.
- It works well for local clusters, Docker-based setups, or testing environments where you just want things to run quickly.
- It is easy because you manually tell Prometheus what to scrape. But in real production systems, things change often, and static config becomes harder to manage.
That is why most production setups use service discovery instead.
Quick Checklist (Before You Move On)
Before you finish your setup, quickly double-check a few things.
- uses correct service endpoint
- has RBAC configured
- filters targets properly
- uses reasonable scrape interval (not too fast or too slow)
- avoids unnecessary metrics
If all of this looks good, your setup is healthy and ready to use.
Conclusion
A good kube state metrics scrape config is not complicated but it has to be intentional. Most monitoring issues do not come from Kubernetes itself. They come from weak or incomplete scraping setups.
Get the basics right, keep the config clean, and your metrics will stay reliable, predictable, and useful.
FAQ Section
1. What port does Kube-State-Metrics expose?
By default, it exposes metrics on port 8080 at the /metrics endpoint.
2. Do I need service discovery for Kube-State-Metrics?
Not strictly. But in dynamic Kubernetes environments, service discovery is strongly recommended for stability.
3. How often should Prometheus scrape Kube-State-Metrics?
A 30–60 second interval is usually sufficient. Faster scraping rarely adds value.
4. Why are my Kube-State-Metrics targets showing DOWN?
Common causes are:
1. wrong service endpoint
2. missing RBAC permissions
3. incorrect namespace filtering
5. Can I reduce the number of metrics scraped?
Yes. Use metric_relabel_configs to filter out unnecessary metrics and reduce load.