Monitoring Kubernetes Clusters with Prometheus and Grafana:

In the world of modern DevOps and SRE, Kubernetes has become the de facto standard for orchestrating containerized applications. However, orchestrating workloads at scale introduces new observability challenges. Effective monitoring is crucial for ensuring reliability, optimizing performance, and…

Monitoring Kubernetes Clusters with Prometheus and Grafana:

Monitoring Kubernetes Clusters with Prometheus and Grafana: A Practical Guide for DevOps Engineers

In the world of modern DevOps and SRE, Kubernetes has become the de facto standard for orchestrating containerized applications. However, orchestrating workloads at scale introduces new observability challenges. Effective monitoring is crucial for ensuring reliability, optimizing performance, and enabling rapid troubleshooting. This guide provides a step-by-step approach to monitoring Kubernetes clusters using Prometheus and Grafana, complete with actionable examples and code snippets.

Why Monitor Kubernetes with Prometheus and Grafana?

Kubernetes is highly dynamic: pods are ephemeral, workloads scale up and down, and infrastructure is often ephemeral. Native Kubernetes tools like kubectl top offer basic insights, but cannot provide historical trends, alerting, or custom dashboards. This is where Prometheus (for metrics collection and alerting) and Grafana (for visualization) shine. Together, they form an open-source, cloud-native monitoring stack trusted by DevOps teams worldwide.

  • Prometheus: Pulls metrics from Kubernetes components, stores them efficiently, and provides a powerful query language (PromQL).
  • Grafana: Connects to Prometheus and other data sources, allowing you to build interactive dashboards and set up alerts.

Step 1: Deploy Prometheus in Your Kubernetes Cluster

The fastest way to get started is with the Kube-Prometheus Stack, which bundles Prometheus, Alertmanager, Grafana, and useful exporters. We’ll use Helm for simplicity.

Prerequisites

  • A running Kubernetes cluster (minikube, kind, EKS, GKE, AKS, etc.)
  • kubectl and helm installed

Install the Kube-Prometheus Stack with Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Create a monitoring namespace
kubectl create namespace monitoring

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring

This command installs Prometheus, Grafana, node-exporter, and related components in the monitoring namespace. It automatically discovers and scrapes Kubernetes metrics.

Step 2: Access and Configure Grafana Dashboards

Once installed, Grafana is exposed as a Kubernetes service. By default it’s not accessible outside the cluster, but you can port-forward for local access:

kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

Now access Grafana at http://localhost:3000 in your browser. The default credentials are usually admin/admin (change this immediately). Out-of-the-box, you'll find dashboards for:

  • Cluster resource usage
  • Pod health and restarts
  • Kubernetes Control Plane metrics
  • Node performance

You can import community dashboards for more granular insights, e.g., Kubernetes Cluster Monitoring (ID: 315) from grafana.com/grafana/dashboards.

Example: Importing a Community Dashboard

  1. In Grafana, go to "Dashboards" → "Import".
  2. Enter the dashboard ID (315) and click "Load".
  3. Select Prometheus as the data source and click "Import".

Step 3: Creating Custom Prometheus Alerts

Prometheus supports alerting rules written in YAML. These rules can detect critical situations like high pod restarts, CPU throttling, or node failures—triggering notifications via Alertmanager.

Example: Alert When a Deployment Has Zero Replicas

groups:
- name: kubernetes.rules
  rules:
  - alert: DeploymentReplicasLow
    expr: kube_deployment_status_replicas_available{namespace="default", deployment="my-app"} == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Deployment {{ $labels.deployment }} has no available replicas"
      description: "Deployment {{ $labels.deployment }} in namespace {{ $labels.namespace }} has zero available replicas for more than 5 minutes."

Save this file as custom-alert-rules.yaml and apply it by updating the Prometheus configuration (with Helm, you can use --set or custom values.yaml).

Step 4: Visualizing Custom Metrics from Your Application

Expose custom metrics from your applications by instrumenting your code with a Prometheus client library. Here’s a Python example using prometheus_client:

from prometheus_client import start_http_server, Counter

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests')

def handle_request():
    REQUESTS.inc()
    # handle your request logic

if __name__ == "__main__":
    start_http_server(8000)
    # App code here

Deploy your app, then configure Prometheus to scrape /metrics from your pod. With Helm, add a PodMonitor or ServiceMonitor resource:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-app-podmonitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: my-app
  podMetricsEndpoints:
    - port: metrics
      path: /metrics

Step 5: Enabling Alerting and Notification Channels

Alertmanager, part of the Kube-Prometheus Stack, routes alerts to notification channels: email, Slack, PagerDuty, and more. To configure Slack notifications, edit the Alertmanager config (values.yaml):

alertmanager:
  config:
    receivers:
      - name: 'slack-notifications'
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
            channel: '#alerts'
    route:
      receiver: 'slack-notifications'

After updating your Helm release, Alertmanager will notify your team in Slack when alerts fire.

Best Practices for Kubernetes Monitoring

  • Monitor both infrastructure and application metrics for a complete picture.
  • Set up SLO-driven alerts to minimize alert fatigue and focus on user-impacting issues.
  • Automate dashboard provisioning using Grafana as Code or Helm charts for reproducibility.
  • Secure your monitoring stack with RBAC, network policies, and authentication for Grafana and Prometheus endpoints.
  • Continuously review and tune alerts to avoid noise and ensure actionable notifications.

Conclusion

Monitoring Kubernetes clusters with Prometheus and Grafana is essential for modern DevOps and SRE teams. By following the steps above, you can deploy a complete observability solution, track key metrics, visualize trends, and proactively address issues before they impact users. Start small—deploy the stack, import ready-to-use dashboards, and incrementally add custom metrics and alerting as your monitoring maturity grows.

Ready to take your observability to the next level? Try deploying the Kube-Prometheus Stack today and start building dashboards that matter to your team.