Kube-State-Metrics Datadog: The Missing Monitoring Layer

If you are using Kubernetes in production, raw metrics are not your problem but visibility is. You can already monitor CPU and memory. But that does not tell you why pods are restarting, why deployments fail, or why replicas do not match expectations. 

That is where kube-state-metrics Datadog integration becomes useful. Instead of guessing cluster behavior, you get structured insights into Kubernetes objects directly inside Datadog, where alerts, dashboards, and workflows already exist.

This guide shows you how it works, how to set it up correctly, and more importantly, when you actually need it.

What Kube-State-Metrics Adds to Datadog

Datadog already collects infrastructure metrics. But Kubernetes is not just infrastructure, it is a system of objects. Kube-State-Metrics (KSM) exposes the state of Kubernetes resources, not performance data. That includes:

  • Pod status (Running, Pending, Failed)
  • Deployment replicas (desired vs actual)
  • Node conditions (Ready, NotReady)
  • Job completions and failures

Without KSM, Datadog sees resource usage. With KSM, Datadog understands cluster behavior. That difference matters.

How the Integration Works (Simple View)

At a glance, this setup just connects two tools. But what really matters is how the data flows from your cluster into something you can actually understand and act on.

Step 1: Kube-State-Metrics Collects Data

Kube-State-Metrics (KSM) talks directly to the Kubernetes API. It does not monitor CPU or memory like traditional tools. Instead, it focuses on object state.

Think of things like:

  • Is your deployment ready or stuck?
  • How many replicas are actually running vs expected?
  • Are pods restarting again and again?

KSM takes all of this raw state information and converts it into clean, structured metrics. This is the missing visibility layer most clusters need.

Step 2: Datadog Cluster Agent Scrapes Metrics

The Datadog Cluster Agent acts like a collector. It regularly “scrapes” the metrics exposed by KSM.

Instead of querying Kubernetes directly for everything, Datadog relies on KSM to provide a simplified and consistent stream of cluster state data. This reduces complexity and keeps monitoring efficient.

Kubernetes monitoring flow: Kubernetes cluster → kube-state-metrics → Datadog dashboard with metrics and graphs.

Step 3: Metrics Are Sent to Datadog

Once collected, the data is pushed into Datadog. Here is where things become useful:

  • Metrics are stored and organized
  • Historical trends become visible
  • You can correlate state data with logs and performance metrics

This turns raw Kubernetes signals into something you can actually analyze.

Step 4: You Build Visibility on Top

Now comes the part that impacts your day-to-day work. Inside Datadog, you use this data to:

  • Create dashboards that show cluster health at a glance
  • Set alerts when something breaks (like pods not becoming ready)
  • Build monitors that catch issues before users notice

Instead of guessing what is wrong, you get clear signals tied to real cluster events.

Kube-State-Metrics Datadog Setup (Step-by-Step)

You do not need a complex setup, but you do need a correct one.

Step 1: Deploy Kube-State-Metrics

If you are using Helm:

“`bash

helm install kube-state-metrics prometheus-community/kube-state-metrics

“`

This deploys a service that exposes Kubernetes object metrics. Make sure:

  • It is running in your cluster
  • The service endpoint is reachable

Step 2: Enable Datadog Kubernetes Integration

If you are using the Datadog Agent, Kubernetes integration is typically enabled by default. But you need to ensure:

  • Cluster Agent is installed
  • RBAC permissions allow metric collection
  • Autodiscovery is active

Step 3: Configure Kube-State-Metrics Collection

Datadog needs to scrape KSM explicitly. In most setups, this is done via Autodiscovery annotations or Helm values. Example:

“`yaml

datadog:

  kubeStateMetricsEnabled: true

“`

Or ensure the Cluster Agent is configured to detect the KSM service.

Step 4: Verify Metrics in Datadog

Once everything is connected, check for metrics like:

  • `kubernetes_state.pod.status_phase`
  • `kubernetes_state.deployment.replicas_desired`
  • `kubernetes_state.node.condition`

If these appear, your setup is working.

What You Can Actually Monitor (Real Use Cases)

This is where most setups fail: collecting data without purpose. Here is what this integration is actually good for.

1. Deployment Health Tracking

You can compare:

  • Desired replicas
  • Available replicas

If they do not match, something is wrong even if the CPU looks fine.

2. Pod Failure Detection

Instead of waiting for logs, you can detect:

  • CrashLoopBackOff
  • Pending pods
  • Failed scheduling

This is faster and easier to alert on.

3. Node Readiness Issues

When nodes go NotReady, KSM exposes it instantly. This allows early alerts before workloads fully fail.

4. Job and CronJob Monitoring

Track:

  • Successful completions
  • Failed jobs
  • Missed schedules

This is critical for pipelines and batch processing.

Datadog vs Prometheus: Where This Fits

This question comes up often: Why use Datadog instead of Prometheus for KSM?

Prometheus Approach

  • Scrapes KSM directly
  • Requires manual dashboards and alerting setup
  • Full control, but more work

Datadog Approach

  • Integrates KSM into existing monitoring
  • Easier dashboards and alerting
  • Less setup overhead

Neither is “better.” Use Prometheus if you want control. Use Datadog if you want speed and simplicity.

Kube-State-Metrics vs Datadog (Not a Real Comparison)

Many people try to compare these two, but they solve different problems.

Kube-State-Metrics is a data source. It reads Kubernetes objects and turns their state into metrics (like pod status, replica counts, deployment health).

Datadog is a monitoring platform. It takes those metrics and helps you:

  • visualize them in dashboards
  • track trends
  • get alerts when something breaks

In simple terms, Datadog shows how your system performs. KSM explains what is actually happening inside Kubernetes.

When both work together, you move from basic monitoring to real observability. There, you do not just see problems, you understand them. 

Common Mistakes to Avoid

1. Collecting Everything

More data sounds useful, but it creates noise. Too many metrics slow queries, increase cost, and make dashboards messy. Start with key metrics, then expand only if needed.

2. Ignoring Labels

Labels add context, but too many cause high cardinality, which can break queries and dashboards. Keep only labels you actually use.

3. Skipping RBAC Checks

Wrong permissions = missing data. And it often fails silently. Always verify ServiceAccount, Roles, and Bindings if metrics look incomplete.

4. Not Building Alerts

Without alerts, data just sits there. Set alerts for real issues like pod failures, replica mismatches, and unhealthy nodes so you can react quickly.

5. Treating It as “Set and Forget”

Your cluster changes over time, so your monitoring should too. Regularly clean up unused metrics, adjust alerts, and keep dashboards relevant.

When You Should Use This Setup

This integration makes sense if:

  • You already use Datadog
  • You need Kubernetes object visibility
  • You want faster setup vs Prometheus

It may not be ideal if:

  • You want full control over metrics pipeline
  • You prefer open-source tooling only
  • You already have a mature Prometheus setup

Practical Optimization Framework

If your setup feels noisy or hard to manage, this simple approach keeps things clean and useful from day one.

Step 1: Start With Core Objects

Begin with the basics inside Kube-State-Metrics. Focus on pods, deployments, and nodes. These three give you a clear picture of cluster health:

  • pods show what is running (or failing)
  • deployments show if your app is stable
  • nodes show if the infrastructure is healthy

This keeps your data small but meaningful.

Step 2: Build Only Critical Alerts

In Datadog, avoid creating too many alerts at the start. Focus only on issues that need immediate attention:

  • pods failing or restarting
  • replicas not matching desired state
  • nodes becoming unhealthy

This helps you avoid alert fatigue and keeps notifications important.

Step 3: Expand Based on Need

Do not try to predict every future issue. Add more metrics only when:

  • you face a real problem
  • you need deeper visibility to debug something

This way, every metric you add has a clear purpose.

Step 4: Keep It Lean

Over time, things pile up. Clean your setup regularly:

  • remove metrics you never use
  • simplify dashboards that feel cluttered

A lean setup makes Datadog faster, easier to read, and much more reliable.

Troubleshooting Tip

If your metrics are missing, do not overthink it. In most cases, the problem is something simple and easy to fix.

Start by checking the connection between Datadog and Kube-State-Metrics. If Datadog is not scraping KSM, no data will ever show up even if everything else looks fine.

Next, look at RBAC permissions. If roles or bindings are wrong, KSM cannot read Kubernetes objects properly. The tricky part is that it may still run, but return incomplete or empty metrics. 

Then verify the service endpoint. Make sure:

  • the KSM service is running
  • the endpoint is reachable from the Datadog agent
  • there are no network or DNS issues blocking access

These three checks solve most issues quickly. Fix them first before going deeper into logs or advanced debugging.

Conclusion

Kube-State-Metrics does not replace Datadog, it completes it. Without it, you see resource usage. With it, you understand what your cluster is actually doing.

Set it up cleanly. Start small. Build intentionally. That is how you turn metrics into something useful.

FAQ Section

1. What does kube state metrics Datadog integration do?

It allows Datadog to collect Kubernetes object state metrics such as pod status, deployment replicas, and node conditions.

2. Do I need kube-state-metrics if I already use Datadog?

Yes, if you want visibility into Kubernetes object states. Datadog alone does not provide this level of detail.

3. Is Datadog better than Prometheus for kube-state-metrics?

Not necessarily. Datadog is easier to use, while Prometheus offers more control. The choice depends on your setup.

4. Does kube-state-metrics increase monitoring cost?

It can, especially if you collect too many metrics or use high-cardinality labels. Proper filtering helps control this.

5. Can I run kube-state-metrics without Datadog?

Yes. It is commonly used with Prometheus, but it works with any system that can scrape metrics.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top