Pod Metrics
Track lifecycle, phase, resource requests, and container-level state
28 metrics
kube_pod_info info
Information about a pod. Returns 1 for each pod. Useful for joining with other pod metrics via label_replace.
namespacepodnodehost_ippod_ipuid
kube_pod_info{namespace="production", node="worker-1"}
# Count pods per node
count by (node) (kube_pod_info) kube_pod_status_phase gauge
The current phase of the pod (Pending, Running, Succeeded, Failed, Unknown). Value is 1 for the active phase, 0 otherwise.
namespacepodphase
# Count of Running pods per namespace
count by (namespace) (
kube_pod_status_phase{phase="Running"} == 1
)
# Alert: pods stuck in Pending
kube_pod_status_phase{phase="Pending"} == 1 kube_pod_status_ready gauge
Describes whether the pod is ready to serve requests (1 = true, 0 = false/unknown).
namespacepodcondition
# Percentage of ready pods
sum(kube_pod_status_ready{condition="true"}) /
count(kube_pod_info) * 100 kube_pod_container_status_restarts_total counter
The number of container restarts per container. High restart count indicates CrashLoopBackOff or OOM.
namespacepodcontainer
# Alert: container restarting frequently increase(kube_pod_container_status_restarts_total[1h]) > 5
kube_pod_container_resource_requests gauge
The number of requested resource (CPU cores or memory bytes) by a container.
namespacepodcontainerresourceunit
# Total CPU requested in namespace
sum by (namespace) (
kube_pod_container_resource_requests{resource="cpu"}
) kube_pod_container_resource_limits gauge
The number of requested limit resource (CPU cores or memory bytes) by a container.
namespacepodcontainerresourceunit
# Containers without memory limits
count(kube_pod_info) - count(
kube_pod_container_resource_limits{resource="memory"}
) Deployment Metrics
Monitor rollout status, replica counts, and update strategy
16 metrics
kube_deployment_status_replicas_available gauge
The number of available replicas per deployment.
namespacedeployment
# Alert: deployment has unavailable replicas kube_deployment_spec_replicas - kube_deployment_status_replicas_available > 0
kube_deployment_status_replicas_updated gauge
The number of updated replicas per deployment. Watch this converge during rolling updates.
namespacedeployment
# Rollout progress as percentage kube_deployment_status_replicas_updated / kube_deployment_spec_replicas * 100
kube_deployment_spec_replicas gauge
Number of desired pods for a deployment as specified in the deployment spec.
namespacedeployment
# Deployments scaled to zero kube_deployment_spec_replicas == 0
kube_deployment_status_condition gauge
The current status conditions of a deployment (Available, Progressing, ReplicaFailure).
namespacedeploymentconditionstatus
kube_deployment_status_condition{
condition="ReplicaFailure", status="true"
} == 1 Node Metrics
Node readiness, conditions, capacity, and allocatable resources
22 metrics
kube_node_status_condition gauge
The condition status of a node (Ready, MemoryPressure, DiskPressure, NetworkUnavailable, PIDPressure).
nodeconditionstatus
# Alert: node not Ready
kube_node_status_condition{
condition="Ready", status="true"
} == 0 kube_node_status_allocatable gauge
The allocatable resource of a node available for scheduling. Accounts for system reserved resources.
noderesourceunit
# CPU allocation ratio per node
sum by (node) (kube_pod_container_resource_requests{resource="cpu"}) /
sum by (node) (kube_node_status_allocatable{resource="cpu"}) kube_node_info info
Information about a node including kernel version, OS image, container runtime version, and kubelet version.
nodekernel_versionos_imagecontainer_runtime_versionkubelet_version
# Count nodes by kubelet version count by (kubelet_version) (kube_node_info)
kube_node_spec_taint gauge
The taint of a node. Useful for tracking cordoned/drained nodes and custom scheduling constraints.
nodekeyvalueeffect
# Cordoned nodes
kube_node_spec_taint{
key="node.kubernetes.io/unschedulable"
} StatefulSet Metrics
Monitor ordered pod management, update strategy, and persistence
10 metrics
kube_statefulset_status_replicas_ready gauge
The number of ready replicas for this StatefulSet controller.
namespacestatefulset
# StatefulSets not fully available kube_statefulset_replicas != kube_statefulset_status_replicas_ready
kube_statefulset_status_current_revision gauge
Indicates the version of the StatefulSet used to generate Pods. Tracks rolling update progress.
namespacestatefulsetrevision
# StatefulSet mid-rollout kube_statefulset_status_current_revision != kube_statefulset_status_update_revision
DaemonSet Metrics
Track desired vs current vs ready counts for node-level workloads
11 metrics
kube_daemonset_status_desired_number_scheduled gauge
The number of nodes that should be running the daemon pod.
namespacedaemonset
# DaemonSet coverage gap kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_number_ready
kube_daemonset_status_number_misscheduled gauge
The number of nodes running a daemon pod but are not supposed to. Indicates scheduling constraint drift.
namespacedaemonset
# Alert: misscheduled DaemonSet pods kube_daemonset_status_number_misscheduled > 0
PersistentVolumeClaim Metrics
Storage binding status, capacity, and access modes
9 metrics
kube_persistentvolumeclaim_status_phase gauge
The phase the persistent volume claim is currently in (Bound, Lost, Pending).
namespacepersistentvolumeclaimphase
# Unbound PVCs
kube_persistentvolumeclaim_status_phase{
phase!="Bound"
} == 1 kube_persistentvolumeclaim_resource_requests_storage_bytes gauge
The capacity of storage requested by the persistent volume claim.
namespacepersistentvolumeclaim
# Total storage requested per namespace (GiB) sum by (namespace) ( kube_persistentvolumeclaim_resource_requests_storage_bytes ) / 1073741824
HorizontalPodAutoscaler Metrics
Current vs desired replicas and scaling conditions
10 metrics
kube_horizontalpodautoscaler_status_current_replicas gauge
Current number of replicas of pods managed by this autoscaler.
namespacehorizontalpodautoscaler
# HPAs at max capacity kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas
kube_horizontalpodautoscaler_spec_min_replicas gauge
Lower limit for the number of replicas to which the autoscaler can scale down.
namespacehorizontalpodautoscaler
# HPAs at min (scale-down event) kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_min_replicas
Job & CronJob Metrics
Completion status, duration, active pods, and schedule health
15 metrics
kube_job_status_active gauge
The number of actively running pods for a Job.
namespacejob_name
# Long-running jobs (over 1 hour) kube_job_status_active == 1 and (time() - kube_job_status_start_time) > 3600
kube_job_status_failed gauge
The number of pods which reached phase Failed for a Job.
namespacejob_namecondition
# Alert: any job failure kube_job_status_failed > 0
kube_cronjob_next_schedule_time gauge
Next time the CronJob should be scheduled. Useful for detecting missed schedules.
namespacecronjob
# Missed CronJob executions time() - kube_cronjob_next_schedule_time > 3600
Namespace Metrics
Namespace phase, labels, and annotations
4 metrics
kube_namespace_status_phase gauge
The phase of the namespace (Active or Terminating).
namespacephase
# Terminating namespaces stuck
kube_namespace_status_phase{phase="Terminating"} Service & Ingress Metrics
Service type, selector, and ingress TLS/backend information
14 metrics
kube_service_info info
Information about a Kubernetes service including cluster IP, type, and external IP.
namespaceservicecluster_ipexternal_iptype
# Count LoadBalancer services
count(kube_service_info{type="LoadBalancer"}) kube_ingress_info info
Information about an Ingress resource including class, default backend, and TLS configuration.
namespaceingressingressclass
# All ingress objects count(kube_ingress_info) by (namespace)