Monitoring Kubernetes Clusters with Prometheus and Grafana:
In the world of modern DevOps and SRE, Kubernetes has become the de facto standard for orchestrating containerized applications. However, orchestrating workloads at scale introduces new observability challenges. Effective monitoring is crucial for ensuring reliability, optimizing performance, and…
Monitoring Kubernetes Clusters with Prometheus and Grafana: A Practical Guide for DevOps Engineers
In the world of modern DevOps and SRE, Kubernetes has become the de facto standard for orchestrating containerized applications. However, orchestrating workloads at scale introduces new observability challenges. Effective monitoring is crucial for ensuring reliability, optimizing performance, and enabling rapid troubleshooting. This guide provides a step-by-step approach to monitoring Kubernetes clusters using Prometheus and Grafana, complete with actionable examples and code snippets.
Why Monitor Kubernetes with Prometheus and Grafana?
Kubernetes is highly dynamic: pods are ephemeral, workloads scale up and down, and infrastructure is often ephemeral. Native Kubernetes tools like kubectl top offer basic insights, but cannot provide historical trends, alerting, or custom dashboards. This is where Prometheus (for metrics collection and alerting) and Grafana (for visualization) shine. Together, they form an open-source, cloud-native monitoring stack trusted by DevOps teams worldwide.
- Prometheus: Pulls metrics from Kubernetes components, stores them efficiently, and provides a powerful query language (PromQL).
- Grafana: Connects to Prometheus and other data sources, allowing you to build interactive dashboards and set up alerts.
Step 1: Deploy Prometheus in Your Kubernetes Cluster
The fastest way to get started is with the Kube-Prometheus Stack, which bundles Prometheus, Alertmanager, Grafana, and useful exporters. We’ll use Helm for simplicity.
Prerequisites
- A running Kubernetes cluster (minikube, kind, EKS, GKE, AKS, etc.)
kubectlandhelminstalled
Install the Kube-Prometheus Stack with Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Create a monitoring namespace
kubectl create namespace monitoring
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring
This command installs Prometheus, Grafana, node-exporter, and related components in the monitoring namespace. It automatically discovers and scrapes Kubernetes metrics.
Step 2: Access and Configure Grafana Dashboards
Once installed, Grafana is exposed as a Kubernetes service. By default it’s not accessible outside the cluster, but you can port-forward for local access:
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
Now access Grafana at http://localhost:3000 in your browser. The default credentials are usually admin/admin (change this immediately). Out-of-the-box, you'll find dashboards for:
- Cluster resource usage
- Pod health and restarts
- Kubernetes Control Plane metrics
- Node performance
You can import community dashboards for more granular insights, e.g., Kubernetes Cluster Monitoring (ID: 315) from grafana.com/grafana/dashboards.
Example: Importing a Community Dashboard
- In Grafana, go to "Dashboards" → "Import".
- Enter the dashboard ID (
315) and click "Load". - Select Prometheus as the data source and click "Import".
Step 3: Creating Custom Prometheus Alerts
Prometheus supports alerting rules written in YAML. These rules can detect critical situations like high pod restarts, CPU throttling, or node failures—triggering notifications via Alertmanager.
Example: Alert When a Deployment Has Zero Replicas
groups:
- name: kubernetes.rules
rules:
- alert: DeploymentReplicasLow
expr: kube_deployment_status_replicas_available{namespace="default", deployment="my-app"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Deployment {{ $labels.deployment }} has no available replicas"
description: "Deployment {{ $labels.deployment }} in namespace {{ $labels.namespace }} has zero available replicas for more than 5 minutes."
Save this file as custom-alert-rules.yaml and apply it by updating the Prometheus configuration (with Helm, you can use --set or custom values.yaml).
Step 4: Visualizing Custom Metrics from Your Application
Expose custom metrics from your applications by instrumenting your code with a Prometheus client library. Here’s a Python example using prometheus_client:
from prometheus_client import start_http_server, Counter
REQUESTS = Counter('http_requests_total', 'Total HTTP Requests')
def handle_request():
REQUESTS.inc()
# handle your request logic
if __name__ == "__main__":
start_http_server(8000)
# App code here
Deploy your app, then configure Prometheus to scrape /metrics from your pod. With Helm, add a PodMonitor or ServiceMonitor resource:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: my-app-podmonitor
namespace: monitoring
spec:
selector:
matchLabels:
app: my-app
podMetricsEndpoints:
- port: metrics
path: /metrics
Step 5: Enabling Alerting and Notification Channels
Alertmanager, part of the Kube-Prometheus Stack, routes alerts to notification channels: email, Slack, PagerDuty, and more. To configure Slack notifications, edit the Alertmanager config (values.yaml):
alertmanager:
config:
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
channel: '#alerts'
route:
receiver: 'slack-notifications'
After updating your Helm release, Alertmanager will notify your team in Slack when alerts fire.
Best Practices for Kubernetes Monitoring
- Monitor both infrastructure and application metrics for a complete picture.
- Set up SLO-driven alerts to minimize alert fatigue and focus on user-impacting issues.
- Automate dashboard provisioning using Grafana as Code or Helm charts for reproducibility.
- Secure your monitoring stack with RBAC, network policies, and authentication for Grafana and Prometheus endpoints.
- Continuously review and tune alerts to avoid noise and ensure actionable notifications.
Conclusion
Monitoring Kubernetes clusters with Prometheus and Grafana is essential for modern DevOps and SRE teams. By following the steps above, you can deploy a complete observability solution, track key metrics, visualize trends, and proactively address issues before they impact users. Start small—deploy the stack, import ready-to-use dashboards, and incrementally add custom metrics and alerting as your monitoring maturity grows.
Ready to take your observability to the next level? Try deploying the Kube-Prometheus Stack today and start building dashboards that matter to your team.