This site uses cookies for authentication, security, and preferences. Privacy Policy

Migrating from a Single CI Pipeline to Multi-State Deployments

You split the monolith. Your Terraform code now lives in separate root modules — networking, compute, database, DNS — each with its own state. The code is cleaner, the blast radius is smaller, and plans are faster.

But your CI pipeline didn't get the memo.

What used to be a single terraform apply step is now a web of jobs that need to run in the right order, pass outputs between each other, and handle failures gracefully. CI tools weren't built for this, and it shows.

The single-pipeline starting point

Most teams start here. One GitHub Actions workflow (or GitLab CI pipeline, or Jenkins job) that runs the full deployment:

# .github/workflows/deploy.yml
name: Deploy Infrastructure
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Terraform Init
        run: terraform init

      - name: Terraform Plan
        run: terraform plan -out=tfplan

      - name: Terraform Apply
        run: terraform apply -auto-approve tfplan

Simple, linear, easy to understand. One state, one pipeline, one set of credentials. It works until it doesn't.

What breaks when you split

After splitting into multiple root modules, you need the CI pipeline to:

  1. Run modules in dependency order. Networking before compute. Compute before DNS. Get the order wrong and the apply fails.
  2. Pass outputs between jobs. Compute needs the vpc_id from networking. DNS needs the load_balancer_ip from compute. CI tools don't have a native concept of Terraform outputs flowing between jobs.
  3. Handle partial failures. If compute fails, DNS shouldn't run — but networking's outputs are still valid. You need selective retry without re-running everything.
  4. Scale with the dependency graph. Every new module means updating the CI config with new jobs, new needs: entries, and new artifact-passing steps.

The CI glue problem

Here's what the "split" version of that pipeline typically looks like:

# .github/workflows/deploy.yml
name: Deploy Infrastructure
on:
  push:
    branches: [main]

jobs:
  networking:
    runs-on: ubuntu-latest
    outputs:
      vpc_id: ${{ steps.output.outputs.vpc_id }}
      private_subnet_ids: ${{ steps.output.outputs.private_subnet_ids }}
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_wrapper: false
      - run: cd modules/networking && terraform init && terraform apply -auto-approve
      - id: output
        run: |
          cd modules/networking
          echo "vpc_id=$(terraform output -raw vpc_id)" >> "$GITHUB_OUTPUT"
          echo "private_subnet_ids=$(terraform output -json private_subnet_ids)" >> "$GITHUB_OUTPUT"

  compute:
    needs: [networking]
    runs-on: ubuntu-latest
    outputs:
      cluster_endpoint: ${{ steps.output.outputs.cluster_endpoint }}
      load_balancer_ip: ${{ steps.output.outputs.load_balancer_ip }}
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_wrapper: false
      - run: |
          cd modules/compute
          terraform init
          terraform apply -auto-approve \
            -var="vpc_id=${{ needs.networking.outputs.vpc_id }}" \
            -var="private_subnet_ids=${{ needs.networking.outputs.private_subnet_ids }}"
      - id: output
        run: |
          cd modules/compute
          echo "cluster_endpoint=$(terraform output -raw cluster_endpoint)" >> "$GITHUB_OUTPUT"
          echo "load_balancer_ip=$(terraform output -raw load_balancer_ip)" >> "$GITHUB_OUTPUT"

  database:
    needs: [networking]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_wrapper: false
      - run: |
          cd modules/database
          terraform init
          terraform apply -auto-approve \
            -var="vpc_id=${{ needs.networking.outputs.vpc_id }}" \
            -var="private_subnet_ids=${{ needs.networking.outputs.private_subnet_ids }}"

  dns:
    needs: [compute]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_wrapper: false
      - run: |
          cd modules/dns
          terraform init
          terraform apply -auto-approve \
            -var="load_balancer_ip=${{ needs.compute.outputs.load_balancer_ip }}"

This is 80 lines of YAML to do what used to be 15. And the problems are already visible:

  • The dependency graph is duplicated. The needs: entries must exactly mirror the Terraform dependency graph. Change a dependency in Terraform and forget to update the CI config, and the pipeline breaks silently or runs out of order.
  • Output passing is fragile. Every output must be explicitly captured with terraform output, written to $GITHUB_OUTPUT, declared in the job's outputs: block, and referenced with ${{ needs.X.outputs.Y }}. Miss one step and the value is silently empty.
  • Credentials are shared. Every job needs every credential, or you need per-job credential configuration — which GitHub Actions doesn't make easy. Terraform's backend blocks don't support variables, so per-module backend configuration in CI means either -backend-config flags on every init or wrapper scripts generating backend files.
  • No change detection. Every push runs every module, even if only one module's code changed.
  • No approval gates. Adding manual approval means adding environment: blocks with protection rules, configured outside the workflow file.

The wrapper script escape hatch

Some teams extract the orchestration into a shell script:

#!/bin/bash
set -euo pipefail

cd modules/networking
terraform init && terraform apply -auto-approve
VPC_ID=$(terraform output -raw vpc_id)
SUBNET_IDS=$(terraform output -json private_subnet_ids)

cd ../compute
terraform init
terraform apply -auto-approve \
  -var="vpc_id=$VPC_ID" \
  -var="private_subnet_ids=$SUBNET_IDS"
CLUSTER_ENDPOINT=$(terraform output -raw cluster_endpoint)

cd ../database
terraform init
terraform apply -auto-approve \
  -var="vpc_id=$VPC_ID" \
  -var="private_subnet_ids=$SUBNET_IDS"

cd ../dns
terraform init
terraform apply -auto-approve \
  -var="load_balancer_ip=$(terraform output -raw load_balancer_ip)"

This is simpler to read but worse in practice. Everything runs sequentially — compute and database can't run in parallel even though they're independent. Error handling is all-or-nothing. And the dependency graph is now encoded in bash ordering rather than YAML structure, making it even harder to reason about.

Migrating to Snap CD

Snap CD replaces the CI orchestration layer. You define your Modules, wire the dependencies via inputs, and the orchestrator handles execution order, output passing, parallelism, and change detection.

Here's the same four-Module setup in Snap CD, using the Terraform provider:

resource "snapcd_stack" "infra" {
  name = "infrastructure"
}

resource "snapcd_namespace" "platform" {
  name     = "platform"
  stack_id = snapcd_stack.infra.id
}

resource "snapcd_module" "networking" {
  name         = "networking"
  namespace_id = snapcd_namespace.platform.id
  source_url   = "https://github.com/myorg/infra.git//modules/networking"
  runner_id    = snapcd_runner.platform.id
}

resource "snapcd_module" "compute" {
  name         = "compute"
  namespace_id = snapcd_namespace.platform.id
  source_url   = "https://github.com/myorg/infra.git//modules/compute"
  runner_id    = snapcd_runner.platform.id
}

resource "snapcd_module" "database" {
  name         = "database"
  namespace_id = snapcd_namespace.platform.id
  source_url   = "https://github.com/myorg/infra.git//modules/database"
  runner_id    = snapcd_runner.platform.id
}

resource "snapcd_module" "dns" {
  name         = "dns"
  namespace_id = snapcd_namespace.platform.id
  source_url   = "https://github.com/myorg/infra.git//modules/dns"
  runner_id    = snapcd_runner.platform.id
}

# Wire the dependency graph
resource "snapcd_module_input_from_output" "compute_vpc" {
  module_id        = snapcd_module.compute.id
  input_kind       = "Param"
  name             = "vpc_id"
  output_module_id = snapcd_module.networking.id
  output_name      = "vpc_id"
}

resource "snapcd_module_input_from_output" "compute_subnets" {
  module_id        = snapcd_module.compute.id
  input_kind       = "Param"
  name             = "private_subnet_ids"
  output_module_id = snapcd_module.networking.id
  output_name      = "private_subnet_ids"
}

resource "snapcd_module_input_from_output" "database_vpc" {
  module_id        = snapcd_module.database.id
  input_kind       = "Param"
  name             = "vpc_id"
  output_module_id = snapcd_module.networking.id
  output_name      = "vpc_id"
}

resource "snapcd_module_input_from_output" "database_subnets" {
  module_id        = snapcd_module.database.id
  input_kind       = "Param"
  name             = "private_subnet_ids"
  output_module_id = snapcd_module.networking.id
  output_name      = "private_subnet_ids"
}

resource "snapcd_module_input_from_output" "dns_lb" {
  module_id        = snapcd_module.dns.id
  input_kind       = "Param"
  name             = "load_balancer_ip"
  output_module_id = snapcd_module.compute.id
  output_name      = "load_balancer_ip"
}

Once this is applied, Snap CD handles the rest:

  • Dependency ordering is automatic. Snap CD knows that compute and database depend on networking, and DNS depends on compute. It runs them in the right order without you encoding it anywhere else.
  • Outputs flow automatically. When networking's vpc_id changes, Snap CD passes the new value to compute and database and triggers re-plans. No capture scripts, no artifacts.
  • Independent Modules run in parallel. Compute and database both depend on networking but not on each other — Snap CD runs them concurrently.
  • Change detection is built in. A commit to modules/dns only triggers a plan for the DNS Module, not the entire graph.

The migration path

You don't have to migrate everything at once. A practical approach:

Terraform's native tooling for moving resources between states has long been a pain point. There's no built-in terraform state split command (requested since 2018), so teams rely on terraform state mv — which doesn't work well across remote states. The moved block gained cross-package support in v1.3, but there's still no way to move resources between completely separate state files declaratively — you're back to terraform state mv one resource at a time.

Step 1: Start with the leaf modules

Pick the Modules with no dependents — typically monitoring, DNS, or application-specific infrastructure. Move them to Snap CD while keeping the rest in CI. This is low-risk because nothing depends on them.

Step 2: Work backward through the dependency graph

Once the leaves are working, move their parents. At each step, the CI pipeline gets shorter — fewer jobs, fewer output-passing steps, fewer needs: entries.

Step 3: Retire the CI pipeline

When all Modules are in Snap CD, the CI workflow file can be deleted. If you still want CI to validate Terraform code (formatting, linting, plan preview on pull requests), keep a lightweight workflow that only runs terraform fmt -check and terraform validate. The deployment orchestration lives in Snap CD.

What your CI pipeline looks like after

# .github/workflows/validate.yml
name: Validate
on:
  pull_request:

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform fmt -check -recursive
      - run: |
          for dir in modules/*/; do
            cd "$dir"
            terraform init -backend=false
            terraform validate
            cd ../..
          done

Validation in CI, deployment in Snap CD. Each tool doing what it's good at.

Tips

  • Don't rewrite your Terraform code. Snap CD works with your existing Modules. The only change is replacing hard-coded values with variables where outputs need to flow between states — which you've likely already done if you split the monolith.
  • Keep the CI pipeline running in parallel during migration. Run both CI and Snap CD for the same Modules until you're confident the Snap CD setup is correct. Snap CD's approval gates let you verify plans before applying.
  • Use approval gates during the transition. Set apply_approval_threshold = 1 on newly migrated Modules so you can review every plan before it applies. Remove the gate once you trust the setup.
  • Check the existing guide on splitting. If you haven't split your monolith yet, read Splitting a Terraform Monolith into Smaller States first — it covers the Terraform side of the migration.

See also

Snap CD

Intelligent GitOps for Infrastructure as Code. Automate, orchestrate, and scale your infrastructure deployments with confidence.


© 2026 Snap CD. All rights reserved.

An unhandled error has occurred. Reload 🗙