Scaling Terraform Infrastructure Beyond a Single Team

Karl Schriek·April 05, 2026

When a single engineer manages all the Terraform in an organisation, everything is simple. One repo, one state, one pipeline, one set of credentials. There's no coordination overhead because there's no one to coordinate with.

That stops working the moment a second team needs to deploy infrastructure. And by the time you have three or four teams — networking, platform, application, security — the single-team model is actively slowing everyone down.

This guide covers what breaks, how teams typically work around it, and how to set up a structure where each team owns their slice of infrastructure independently.

What breaks

State lock contention

Terraform's state locking is per-state. When the networking team is running terraform plan, the application team's pipeline is blocked — even though they're changing completely unrelated resources. The more teams share a state, the more time everyone spends waiting.

Blast radius

A junior engineer deploying a new application service shouldn't be able to accidentally destroy the VPC. But if application resources and networking resources share a state, a single misconfigured terraform apply can touch anything. Code review catches some of this. Not all of it.

Credential sprawl

A shared pipeline needs credentials for everything — the networking team's Azure subscription, the application team's AWS account, the security team's DNS provider. Every team's secrets end up in one CI environment, accessible to anyone who can trigger a run. This fails most compliance audits.

Approval bottlenecks

In many organisations, one person or a small group gatekeeps all infrastructure changes. Every PR needs their review. Every apply needs their approval. The gatekeeper becomes a bottleneck not because they're slow, but because they're a single point of serialisation for all infrastructure work.

Backend access as implicit access control

Terraform has no built-in concept of per-team or per-workspace permissions. All workspaces in a backend share the same credentials, so giving a user access to one workspace implicitly grants access to all of them. There's been a long-standing request to support separate backend configurations per workspace, and a related request to allow variables in backend configuration blocks — both still open. Teams that need isolation end up managing separate backends per team — which works, but now the cross-team dependency problem (how to pass outputs between backends) sits on top of the access control problem. The demand for a scalable multi-root-module architecture is significant — OpenTofu's proposal to make terraliths a thing of the past has drawn significant community support.

Knowledge boundaries

The networking team understands route tables and peering. The application team understands container orchestration and databases. When both work in the same Terraform codebase, they need to understand each other's resources well enough to avoid breaking them. That cross-training is expensive and doesn't scale.

Typical approaches

Separate repos and pipelines per team

The most common first attempt: give each team their own repo, their own CI pipeline, and their own state backend. This solves the isolation problem but creates a new one — how do teams share outputs? The networking team produces a vpc_id that the application team needs.

Teams end up with one of:

Manual handoff: someone copies an output value into another team's terraform.tfvars. This is error-prone and doesn't trigger re-deploys when the upstream value changes.
terraform_remote_state: each consuming team configures a data source pointing at the producer's state backend. This tightly couples teams to each other's backend configuration and provides no change detection.
Shell scripts or CI glue: a pipeline runs terraform output on one state and feeds the result into terraform apply -var on the next. The dependency graph lives in CI configuration rather than in code, and it's fragile.

Workspaces

Terraform workspaces let you run the same configuration against multiple state files. Some teams use this to give each team their own workspace. But workspaces don't solve cross-team dependencies — they're designed for multiple instances of the same infrastructure (dev, staging, prod), not for splitting ownership of different infrastructure.

Terragrunt

Terragrunt adds a layer on top of Terraform that can manage dependencies between configurations. It works, but introduces its own complexity — terragrunt.hcl files, dependency blocks, wrapper commands. Teams now need to learn Terragrunt in addition to Terraform, and debugging requires understanding both layers. Your Terraform code also becomes coupled to Terragrunt's conventions.

Platform team as intermediary

Some organisations create a platform team that owns all the Terraform and exposes a simplified interface (YAML files, internal portals, or custom tooling) to application teams. This can work well, but it means application teams can't deploy infrastructure directly — they file tickets or submit YAML and wait. The platform team becomes the bottleneck instead.

A better structure

The goal is straightforward: each team owns their own Terraform modules with their own state, credentials, and approval workflows, while cross-team dependencies are handled automatically.

Define ownership boundaries

Start by mapping teams to infrastructure boundaries:

Platform team       → networking, DNS, shared services
Application team A  → their databases, caches, storage
Application team B  → their databases, queues, functions
Security team       → IAM policies, compliance resources, audit logging

Each boundary becomes an independent Terraform root module with its own state. The platform team's networking module produces outputs (vpc_id, subnet_ids) that the application teams consume as inputs.

Scope credentials per team

Each team's deployment environment should only have the credentials it needs. The platform team's runner has access to the networking subscription. Application team A's runner has access to their project's service account. No team has access to another team's cloud credentials.

This isn't just a security measure — it's an organisational one. When teams know they can't accidentally (or intentionally) touch resources outside their boundary, they move faster and with more confidence.

Scope approvals per team

The platform team should approve changes to networking. Application team A should approve changes to their own databases. Neither team should need the other's approval for changes within their boundary.

This requires an approval system that understands infrastructure boundaries — not just "can this user approve?" but "can this user approve changes to this specific module?"

Wire dependencies declaratively

When the platform team changes a subnet, the application teams that depend on those subnets should automatically re-plan and re-deploy. This should happen without the platform team needing to notify anyone, without the application teams needing to poll for changes, and without a CI pipeline encoding the dependency graph in YAML.

How Snap CD handles this

Snap CD's architecture maps directly to the multi-team structure described above.

Modules as ownership units

Each team's Terraform root becomes a Snap CD Module. Modules are grouped into Namespaces within a Stack, creating a natural hierarchy:

resource "snapcd_stack" "prod" {
  name = "prod"
}

resource "snapcd_namespace" "platform" {
  name     = "platform"
  stack_id = snapcd_stack.prod.id
}

resource "snapcd_namespace" "app_a" {
  name     = "app-a"
  stack_id = snapcd_stack.prod.id
}

resource "snapcd_module" "networking" {
  name         = "networking"
  namespace_id = snapcd_namespace.platform.id
  source_url   = "https://github.com/myorg/infra-networking.git"
  runner_id    = snapcd_runner.platform.id
}

resource "snapcd_module" "app_a_database" {
  name         = "database"
  namespace_id = snapcd_namespace.app_a.id
  source_url   = "https://github.com/myorg/app-a-database.git"
  runner_id    = snapcd_runner.app_a.id
}

Scoped permissions

Snap CD's RBAC system lets you assign roles at any level of the hierarchy — organisation, Stack, Namespace, or individual Module:

# Platform team owns their namespace
resource "snapcd_role_assignment" "platform_team" {
  principal_id = snapcd_group.platform_team.id
  role         = "Owner"
  scope_id     = snapcd_namespace.platform.id
  scope_type   = "Namespace"
}

# App team A owns their namespace
resource "snapcd_role_assignment" "app_a_team" {
  principal_id = snapcd_group.app_a_team.id
  role         = "Owner"
  scope_id     = snapcd_namespace.app_a.id
  scope_type   = "Namespace"
}

# App team A can read platform outputs (to see what's available)
resource "snapcd_role_assignment" "app_a_reads_platform" {
  principal_id = snapcd_group.app_a_team.id
  role         = "Reader"
  scope_id     = snapcd_namespace.platform.id
  scope_type   = "Namespace"
}

Each team can deploy, approve, and manage their own Modules without involving anyone else. They can read the platform team's outputs but can't modify platform resources.

Isolated Runners

Each team deploys their own Runner with only the credentials they need:

The platform team's Runner has Azure Networking Contributor credentials.
App team A's Runner has access to their specific resource group.
Neither Runner can access the other team's cloud resources.

Snap CD's permission system also controls which Modules can use which Runners, so even if a team tried to point their Module at the platform Runner, it would be denied.

Automatic dependency wiring

Cross-team dependencies are declared once and enforced automatically:

resource "snapcd_module_input_from_output" "vpc_id" {
  module_id        = snapcd_module.app_a_database.id
  input_kind       = "Param"
  name             = "vpc_id"
  output_module_id = snapcd_module.networking.id
  output_name      = "vpc_id"
}

When the platform team changes networking and the vpc_id output updates, Snap CD automatically queues a re-plan for app team A's database Module. The app team's approval workflow decides whether to apply it. No manual handoff, no polling, no CI glue.

A practical example

An organisation with three teams:

Team	Namespace	Modules	Runner
Platform	`prod/platform`	networking, dns, shared-services	`runner-platform` (Azure Networking + DNS credentials)
App team A	`prod/app-a`	api-database, api-cache, api-storage	`runner-app-a` (Azure App A resource group credentials)
App team B	`prod/app-b`	worker-queue, worker-functions	`runner-app-b` (AWS App B account credentials)

Each team:

Owns their namespace and everything in it.
Deploys using their own runner with scoped credentials.
Approves their own changes without involving other teams.
Receives automatic re-plans when upstream dependencies change.

The platform team can ship a networking change without notifying anyone. Both app teams automatically re-plan if relevant outputs changed. If nothing changed that affects them, nothing happens.

Tips

Start with two teams, not five. Split the most obvious boundary first — usually platform vs. application. Add more boundaries as the need becomes clear.
Give each team a Namespace, not just Modules. Namespaces let you assign permissions once for the whole group rather than per-Module.
Use Reader roles for cross-team visibility. Teams should be able to see what other teams are deploying without being able to modify it.
Don't share Runners across trust boundaries. A Runner that has both prod networking and prod application credentials defeats the purpose of isolation.
Document the dependency graph. Even though Snap CD manages it automatically, teams should understand which of their inputs come from other teams and what would trigger a re-plan.
Resist the urge to centralise approvals. If you've scoped permissions correctly, each team is qualified to approve their own changes. A central approval requirement reintroduces the bottleneck you're trying to eliminate.