Unified Monitoring for Multi-Cloud Ecosystems

As a South African SRE working with teams spread across Johannesburg, Cape Town, and a couple of European regions, I’ve seen first-hand how quickly complexity explodes when you adopt a multi-cloud strategy. Unified Monitoring for Multi-Cloud Ecosystems is…

Unified Monitoring for Multi-Cloud Ecosystems

Unified Monitoring for Multi-Cloud Ecosystems

As a South African SRE working with teams spread across Johannesburg, Cape Town, and a couple of European regions, I’ve seen first-hand how quickly complexity explodes when you adopt a multi-cloud strategy. Unified Monitoring for Multi-Cloud Ecosystems is not a nice-to-have; it’s the only way to keep your mean time to detect (MTTD) and mean time to resolve (MTTR) under control when your workloads span AWS, Azure, GCP, and on‑prem.

In this post, I’ll walk through how I approach Unified Monitoring for Multi-Cloud Ecosystems using Grafana as the primary observability front-end, backed by Prometheus and OpenTelemetry, with practical examples and snippets you can adapt.

Why Unified Monitoring for Multi-Cloud Ecosystems Matters

Multi-cloud is great for negotiating pricing, avoiding lock-in, and placing workloads closer to African users via different providers’ POPs. But without unified monitoring, you end up with:

  • Separate dashboards for CloudWatch, Azure Monitor, and Google Cloud Monitoring
  • Different metric names and labels for the same concept (CPU, latency, errors)
  • Inconsistent alerting and incident workflows per provider

Unified Monitoring for Multi-Cloud Ecosystems means building a single observability layer where:

  • All telemetry (metrics, logs, traces, events) lands in a central system or tightly integrated stack[2]
  • Cloud-native tools (CloudWatch, Azure Monitor, Cloud Monitoring) act as sources, not destinations[2][1]
  • Dashboards, alerts, and SLOs are defined once and reused across clouds[2][4]

Reference Architecture with Grafana

A practical pattern I use for Unified Monitoring for Multi-Cloud Ecosystems is:

  1. Deploy a consistent telemetry agent stack (e.g. OpenTelemetry Collector + Prometheus node/agent + Fluent Bit) into each cloud[2].
  2. Standardise labels and naming across clouds (e.g. cloud_provider, region, env)[4].
  3. Forward metrics, logs, and traces to a central observability backend (Grafana stack, Grafana Cloud, or similar)[2][6].
  4. Use Grafana as the “single pane of glass” for cross-cloud dashboards and SLOs[1][2][4].
  5. Define unified alerting rules that do not depend on cloud-specific quirks[2][4].

In South African environments, I often terminate the unified stack in a region with good latency to both local and foreign regions (for example, an EU or Middle East region) while keeping sensitive data in a local POP when required by POPIA.

Connecting AWS, Azure, and GCP into Grafana

Configuring Grafana Data Sources

Grafana provides native data sources for AWS CloudWatch, Azure Monitor, and Google Cloud Monitoring, which makes it well-suited for Unified Monitoring for Multi-Cloud Ecosystems[1][3]. The trick is to treat these providers as “raw feeds” and then normalise the data using dashboards, transformations, and labels.

AWS CloudWatch Data Source Example

In AWS, I typically create an IAM role with read-only CloudWatch permissions and let Grafana assume that role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricData",
        "cloudwatch:ListMetrics",
        "cloudwatch:GetMetricStatistics"
      ],
      "Resource": "*"
    }
  ]
}

Then in Grafana’s provisioning/datasources/aws.yml:

apiVersion: 1

datasources:
  - name: AWS CloudWatch
    type: cloudwatch
    access: proxy
    isDefault: false
    jsonData:
      authType: default
      defaultRegion: af-south-1
      assumeRoleArn: arn:aws:iam::123456789012:role/grafana-cloudwatch-read

Note the af-south-1 region – when you’re running workloads in the AWS Africa (Cape Town) region, ensure your dashboards can filter by region so they work for af-south-1, eu-west-1, etc.

Azure Monitor Data Source Example

For Azure, I register an Azure AD app with Monitor Reader permissions and map it in Grafana:

apiVersion: 1

datasources:
  - name: Azure Monitor
    type: grafana-azure-monitor-datasource
    access: proxy
    jsonData:
      cloudName: azuremonitor
      tenantId: <tenant-id>
      clientId: <client-id>
      subscriptionId: <subscription-id>
      azureLogAnalyticsWorkspace: <workspace-id>
    secureJsonData:
      clientSecret: <client-secret>

Google Cloud Monitoring (Stackdriver) Data Source Example

For GCP, I create a service account with the Monitoring Viewer role and import the JSON key into Grafana:

apiVersion: 1

datasources:
  - name: GCP Monitoring
    type: stackdriver
    access: proxy
    jsonData:
      authenticationType: jwt
      defaultProject: <project-id>
      clientEmail: <service-account-email>
      tokenUri: https://oauth2.googleapis.com/token
    secureJsonData:
      privateKey: |
        -----BEGIN PRIVATE KEY-----
        <private-key>
        -----END PRIVATE KEY-----

With these three data sources, you can start building dashboards that slice across cloud_provider, region, and service, giving you the backbone for Unified Monitoring for Multi-Cloud Ecosystems[1][2].

Standardising Metrics and Labels

If you only remember one thing from this article, let it be this: standardise your telemetry naming and labels across clouds[4].

Here’s a convention I use for infrastructure metrics:

  • Metric names: infra_cpu_utilization, infra_memory_usage_bytes, infra_disk_usage_ratio
  • Core labels:
    • cloud_provider: aws | azure | gcp | onprem
    • region: Cloud-specific region ID
    • env: prod | staging | dev
    • service: Logical service name

For Kubernetes workloads that run in multiple clouds, I use Prometheus relabeling to inject these labels as they’re scraped.

Prometheus Scrape Config Example for Multi-Cloud

scrape_configs:
  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - source_labels: [__meta_kubernetes_node_label_cloud_provider]
        target_label: cloud_provider
      - source_labels: [__meta_kubernetes_node_label_topology_kubernetes_io_region]
        target_label: region
      - source_labels: [__meta_kubernetes_node_label_env]
        target_label: env

In each cluster (AWS EKS, Azure AKS, GKE), I ensure the nodes are labelled consistently:

kubectl label node <node-name> \
  cloud_provider=aws \
  topology.kubernetes.io/region=af-south-1 \
  env=prod

Now a single Grafana panel can visualize CPU utilization across all providers:

sum by (cloud_provider, region) (rate(infra_cpu_utilization[5m]))

This is the practical heart of Unified Monitoring for Multi-Cloud Ecosystems: one query, multiple clouds.

Unified SLOs and Alerting

Once metrics are standardised, you can define SLOs and alerts that apply across clouds, instead of maintaining per-provider rules[2][4]. For example, a global