Managing Terraform Across Multiple Cloud Providers
Most organisations don't live in a single cloud. You might run compute in AWS, DNS in Cloudflare, identity in Azure AD, and logging in GCP. Terraform handles each provider fine on its own, but the moment you need to coordinate across providers the tooling fights you.
This guide walks through the common pain points of multi-cloud Terraform setups and the approaches teams use to cope — then shows how Snap CD makes cross-cloud dependency management a solved problem.
Where it gets difficult
Credential sprawl
Each cloud provider has its own authentication mechanism. AWS uses IAM roles and access keys. Azure uses service principals and managed identities. GCP uses service accounts and workload identity federation. A single Terraform state that spans providers needs credentials for all of them — which means your CI runner or developer workstation holds keys to everything.
That's a security problem. A compromised CI pipeline with AWS and Azure credentials exposes both clouds simultaneously. And it's an operational problem — rotating credentials means updating every pipeline that touches that state. This problem compounds at scale: Terraform couples provider processes tightly to credentials, so managing hundreds of accounts across clouds means spawning thousands of provider processes, which quickly becomes unmanageable.
Provider version conflicts
Terraform providers are versioned independently. Upgrading the AWS provider to fix a bug in aws_eks_cluster shouldn't require you to also test a new version of the Azure provider. But when they share a state, a terraform init -upgrade pulls new versions for everything, and a regression in one provider blocks all deployments. Terraform also lacks built-in support for instantiating multiple providers with a loop and passing providers to modules in for_each, making multi-cloud configurations especially verbose and repetitive.
Blast radius across clouds
A misconfigured terraform apply in a single-cloud state damages resources in one cloud. A misconfigured apply in a multi-cloud state can damage resources everywhere. The blast radius scales with the number of providers in the state.
Slow plans
Every terraform plan refreshes every resource in state. When your state contains resources across three clouds, the plan makes API calls to all three — and it's only as fast as the slowest provider. A plan that takes 30 seconds per cloud takes 90 seconds when they're all in one state.
Typical approaches
Separate repos per cloud
The simplest split: one repo for AWS infrastructure, one for Azure, one for GCP. Each has its own state, its own CI pipeline, its own credentials.
infra-aws/ # VPCs, EKS, S3 buckets
infra-azure/ # AKS, Azure SQL, Key Vault
infra-gcp/ # GKE, Cloud SQL, BigQuery
This solves credential isolation and blast radius. But it introduces a new problem: cross-cloud dependencies. Your Azure DNS zone needs the IP address of an AWS load balancer. Your GCP logging sink needs the ARN of an AWS S3 bucket. These values have to flow between repos somehow.
Monorepo with directory-per-cloud
Keep everything in one repo but separate by directory. Each directory has its own state:
infra/
aws/
networking/
compute/
azure/
dns/
identity/
gcp/
logging/
Better for code organisation, but the dependency problem remains. You still need to pass outputs from aws/networking to azure/dns, and nothing in Terraform's native tooling handles that.
terraform_remote_state across clouds
The built-in approach: each consuming state reads the producer's state directly.
data "terraform_remote_state" "aws_networking" {
backend = "s3"
config = {
bucket = "my-terraform-state"
key = "aws/networking/terraform.tfstate"
region = "us-east-1"
}
}
resource "azurerm_dns_a_record" "api" {
name = "api"
zone_name = azurerm_dns_zone.main.name
resource_group_name = azurerm_dns_zone.main.resource_group_name
ttl = 300
records = [data.terraform_remote_state.aws_networking.outputs.load_balancer_ip]
}
This works but has the same drawbacks it always does:
- Every consumer needs the backend configuration of every producer — including cross-cloud backend access (Azure state reading from an S3 bucket needs AWS credentials too).
- No automatic re-plan when the upstream state changes. You have to trigger it manually or via CI glue.
- The dependency graph lives in your head, not in code.
Wrapper scripts and CI orchestration
When terraform_remote_state gets too painful, teams write wrapper scripts:
# Apply AWS networking first
cd infra/aws/networking
terraform apply -auto-approve
# Extract outputs
LB_IP=$(terraform output -raw load_balancer_ip)
# Apply Azure DNS with the output
cd ../../azure/dns
terraform apply -auto-approve -var="load_balancer_ip=$LB_IP"
Or they build multi-step CI pipelines that chain applies in order, passing outputs via pipeline variables or artifacts. This is fragile — the dependency graph is encoded in CI config, not infrastructure code. Adding a new dependency means editing the pipeline, not just the Terraform.
How Snap CD handles multi-cloud
Snap CD was built for exactly this problem. Each cloud's infrastructure becomes one or more Snap CD Modules, each assigned to a Runner with the appropriate credentials. The dependency graph is declared in code via inputs, and Snap CD handles the orchestration.
One Runner per cloud
Deploy a Runner in each cloud environment with only the credentials it needs:
resource "snapcd_runner" "aws" {
name = "aws-runner"
# Deployed in AWS with IAM role — only has AWS credentials
}
resource "snapcd_runner" "azure" {
name = "azure-runner"
# Deployed in Azure with managed identity — only has Azure credentials
}
Each Runner only has access to its own cloud. A compromised AWS Runner can't touch Azure resources.
Modules per cloud component
Each piece of infrastructure is a Module, assigned to the appropriate Runner:
resource "snapcd_module" "aws_networking" {
name = "networking"
namespace_id = snapcd_namespace.aws.id
source_url = "https://github.com/myorg/infra-aws-networking.git"
runner_id = snapcd_runner.aws.id
}
resource "snapcd_module" "azure_dns" {
name = "dns"
namespace_id = snapcd_namespace.azure.id
source_url = "https://github.com/myorg/infra-azure-dns.git"
runner_id = snapcd_runner.azure.id
}
Cross-cloud dependencies as code
The load balancer IP from AWS flows into Azure DNS via a snapcd_module_input_from_output:
resource "snapcd_module_input_from_output" "lb_ip_to_dns" {
module_id = snapcd_module.azure_dns.id
input_kind = "Param"
name = "load_balancer_ip"
output_module_id = snapcd_module.aws_networking.id
output_name = "load_balancer_ip"
}
With this in place:
- Snap CD knows to apply
aws_networkingbeforeazure_dns. - When
aws_networkingis applied and itsload_balancer_ipoutput changes,azure_dnsautomatically re-plans and re-applies. - No wrapper scripts. No CI orchestration. No cross-cloud
terraform_remote_state. - The AWS Runner never needs Azure credentials and vice versa.
A practical multi-cloud example
A common pattern: compute in AWS, DNS and identity in Azure, logging in GCP.
Namespace: aws
Module: networking (runner: aws-runner)
Module: compute (runner: aws-runner)
Namespace: azure
Module: identity (runner: azure-runner)
Module: dns (runner: azure-runner)
Namespace: gcp
Module: logging (runner: gcp-runner)
Dependencies:
# compute needs vpc_id from networking
resource "snapcd_module_input_from_output" "vpc_to_compute" {
module_id = snapcd_module.compute.id
input_kind = "Param"
name = "vpc_id"
output_module_id = snapcd_module.networking.id
output_name = "vpc_id"
}
# dns needs load_balancer_ip from compute
resource "snapcd_module_input_from_output" "lb_to_dns" {
module_id = snapcd_module.dns.id
input_kind = "Param"
name = "load_balancer_ip"
output_module_id = snapcd_module.compute.id
output_name = "load_balancer_ip"
}
# logging needs cluster_name from compute and subscription_id from identity
resource "snapcd_module_input_from_output" "cluster_to_logging" {
module_id = snapcd_module.logging.id
input_kind = "Param"
name = "cluster_name"
output_module_id = snapcd_module.compute.id
output_name = "cluster_name"
}
resource "snapcd_module_input_from_output" "sub_to_logging" {
module_id = snapcd_module.logging.id
input_kind = "Param"
name = "azure_subscription_id"
output_module_id = snapcd_module.identity.id
output_name = "subscription_id"
}
A commit to infra-aws-networking triggers a cascade: networking re-applies, compute re-plans (because vpc_id might have changed), DNS re-plans if load_balancer_ip changed, and logging re-plans if cluster_name changed. Each step runs on the runner with the right credentials. No manual intervention.
Comparison
| Single multi-cloud state | Separate repos + CI glue | Snap CD | |
|---|---|---|---|
| Credential isolation | None — one set of creds for all clouds | Per-repo/pipeline | Per-Runner |
| Blast radius | All clouds | Single cloud | Single Module |
| Cross-cloud dependencies | Direct references | Scripts / CI variables | Declarative wiring |
| Automatic cascading | N/A (single state) | Manual triggers | Built-in |
| Plan speed | Slowest provider wins | Per-cloud | Per-Module |
Tips
- Start with one Runner per cloud. You can split further later (e.g., separate Runners for prod-aws and dev-aws), but one per cloud is the natural starting point.
- Keep cross-cloud dependencies narrow. A handful of outputs flowing between clouds (IPs, ARNs, resource IDs) is normal. If you're passing dozens, you might have a boundary in the wrong place.
- Use Namespaces to mirror your cloud structure.
aws/networking,azure/dns,gcp/loggingmakes the dependency graph readable at a glance. - Don't share state backends across clouds. An S3 backend for AWS state and an Azure Storage Account for Azure state is fine — Snap CD manages the dependency graph, not the backends.
See also
- Modular Deployments — how the Module and input system works in detail
- Self-Hosted Terraform Runners with Credential Isolation — per-cloud Runner deployment patterns
- The Problem with Large Terraform States — why splitting by cloud provider reduces plan time and API throttling
- Splitting a Terraform Monolith — how to break a multi-cloud monolith into per-cloud states