Unified Monitoring for Multi-Cloud Ecosystems
As a South African SRE working with teams spread across Johannesburg, Cape Town, and a couple of European regions, I’ve seen first-hand how quickly complexity explodes when you adopt a multi-cloud strategy. Unified Monitoring for Multi-Cloud Ecosystems is…
Unified Monitoring for Multi-Cloud Ecosystems
As a South African SRE working with teams spread across Johannesburg, Cape Town, and a couple of European regions, I’ve seen first-hand how quickly complexity explodes when you adopt a multi-cloud strategy. Unified Monitoring for Multi-Cloud Ecosystems is not a nice-to-have; it’s the only way to keep your mean time to detect (MTTD) and mean time to resolve (MTTR) under control when your workloads span AWS, Azure, GCP, and on‑prem.
In this post, I’ll walk through how I approach Unified Monitoring for Multi-Cloud Ecosystems using Grafana as the primary observability front-end, backed by Prometheus and OpenTelemetry, with practical examples and snippets you can adapt.
Why Unified Monitoring for Multi-Cloud Ecosystems Matters
Multi-cloud is great for negotiating pricing, avoiding lock-in, and placing workloads closer to African users via different providers’ POPs. But without unified monitoring, you end up with:
- Separate dashboards for CloudWatch, Azure Monitor, and Google Cloud Monitoring
- Different metric names and labels for the same concept (CPU, latency, errors)
- Inconsistent alerting and incident workflows per provider
Unified Monitoring for Multi-Cloud Ecosystems means building a single observability layer where:
- All telemetry (metrics, logs, traces, events) lands in a central system or tightly integrated stack[2]
- Cloud-native tools (CloudWatch, Azure Monitor, Cloud Monitoring) act as sources, not destinations[2][1]
- Dashboards, alerts, and SLOs are defined once and reused across clouds[2][4]
Reference Architecture with Grafana
A practical pattern I use for Unified Monitoring for Multi-Cloud Ecosystems is:
- Deploy a consistent telemetry agent stack (e.g. OpenTelemetry Collector + Prometheus node/agent + Fluent Bit) into each cloud[2].
- Standardise labels and naming across clouds (e.g.
cloud_provider,region,env)[4]. - Forward metrics, logs, and traces to a central observability backend (Grafana stack, Grafana Cloud, or similar)[2][6].
- Use Grafana as the “single pane of glass” for cross-cloud dashboards and SLOs[1][2][4].
- Define unified alerting rules that do not depend on cloud-specific quirks[2][4].
In South African environments, I often terminate the unified stack in a region with good latency to both local and foreign regions (for example, an EU or Middle East region) while keeping sensitive data in a local POP when required by POPIA.
Connecting AWS, Azure, and GCP into Grafana
Configuring Grafana Data Sources
Grafana provides native data sources for AWS CloudWatch, Azure Monitor, and Google Cloud Monitoring, which makes it well-suited for Unified Monitoring for Multi-Cloud Ecosystems[1][3]. The trick is to treat these providers as “raw feeds” and then normalise the data using dashboards, transformations, and labels.
AWS CloudWatch Data Source Example
In AWS, I typically create an IAM role with read-only CloudWatch permissions and let Grafana assume that role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData",
"cloudwatch:ListMetrics",
"cloudwatch:GetMetricStatistics"
],
"Resource": "*"
}
]
}
Then in Grafana’s provisioning/datasources/aws.yml:
apiVersion: 1
datasources:
- name: AWS CloudWatch
type: cloudwatch
access: proxy
isDefault: false
jsonData:
authType: default
defaultRegion: af-south-1
assumeRoleArn: arn:aws:iam::123456789012:role/grafana-cloudwatch-read
Note the af-south-1 region – when you’re running workloads in the AWS Africa (Cape Town) region, ensure your dashboards can filter by region so they work for af-south-1, eu-west-1, etc.
Azure Monitor Data Source Example
For Azure, I register an Azure AD app with Monitor Reader permissions and map it in Grafana:
apiVersion: 1
datasources:
- name: Azure Monitor
type: grafana-azure-monitor-datasource
access: proxy
jsonData:
cloudName: azuremonitor
tenantId: <tenant-id>
clientId: <client-id>
subscriptionId: <subscription-id>
azureLogAnalyticsWorkspace: <workspace-id>
secureJsonData:
clientSecret: <client-secret>
Google Cloud Monitoring (Stackdriver) Data Source Example
For GCP, I create a service account with the Monitoring Viewer role and import the JSON key into Grafana:
apiVersion: 1
datasources:
- name: GCP Monitoring
type: stackdriver
access: proxy
jsonData:
authenticationType: jwt
defaultProject: <project-id>
clientEmail: <service-account-email>
tokenUri: https://oauth2.googleapis.com/token
secureJsonData:
privateKey: |
-----BEGIN PRIVATE KEY-----
<private-key>
-----END PRIVATE KEY-----
With these three data sources, you can start building dashboards that slice across cloud_provider, region, and service, giving you the backbone for Unified Monitoring for Multi-Cloud Ecosystems[1][2].
Standardising Metrics and Labels
If you only remember one thing from this article, let it be this: standardise your telemetry naming and labels across clouds[4].
Here’s a convention I use for infrastructure metrics:
- Metric names:
infra_cpu_utilization,infra_memory_usage_bytes,infra_disk_usage_ratio - Core labels:
cloud_provider:aws|azure|gcp|onpremregion: Cloud-specific region IDenv:prod|staging|devservice: Logical service name
For Kubernetes workloads that run in multiple clouds, I use Prometheus relabeling to inject these labels as they’re scraped.
Prometheus Scrape Config Example for Multi-Cloud
scrape_configs:
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_label_cloud_provider]
target_label: cloud_provider
- source_labels: [__meta_kubernetes_node_label_topology_kubernetes_io_region]
target_label: region
- source_labels: [__meta_kubernetes_node_label_env]
target_label: env
In each cluster (AWS EKS, Azure AKS, GKE), I ensure the nodes are labelled consistently:
kubectl label node <node-name> \
cloud_provider=aws \
topology.kubernetes.io/region=af-south-1 \
env=prod
Now a single Grafana panel can visualize CPU utilization across all providers:
sum by (cloud_provider, region) (rate(infra_cpu_utilization[5m]))
This is the practical heart of Unified Monitoring for Multi-Cloud Ecosystems: one query, multiple clouds.
Unified SLOs and Alerting
Once metrics are standardised, you can define SLOs and alerts that apply across clouds, instead of maintaining per-provider rules[2][4]. For example, a global