# Infrastructure Management

This use case demonstrates how to manage infrastructure as code using OpenTofu (or Terraform) within Argo Workflows, orchestrated through Pipekit.

## Overview

Pipekit can orchestrate infrastructure-as-code workflows that validate, plan, and apply infrastructure changes. This approach provides automated testing, drift detection, and controlled deployment of infrastructure changes through your CI/CD pipeline.

## Key Workflows

Infrastructure management workflows typically include:

* **Security scanning** with tools like Checkov to identify misconfigurations
* **Linting** with tflint to enforce best practices
* **Plan generation** showing proposed infrastructure changes
* **Drift detection** identifying when live infrastructure diverges from code
* **Automated PR comments** providing visibility into proposed changes

## Example: Pull Request Validation

When a pull request modifies infrastructure code, automated workflows can validate the changes before merge.

The workflow performs several validation steps:

1. Clone the infrastructure repository
2. Run security scans to detect misconfigurations
3. Run linting to ensure code quality
4. Generate Terraform/OpenTofu plans showing what will change
5. Post plan summaries as PR comments for review

### Workflow Structure

The PR validation workflow consists of multiple tasks organized as a DAG (directed acyclic graph):

```yaml
- name: main
  dag:
    tasks:
      - name: clone-repo
        template: clone-repo
      - name: checkov-scan
        template: checkov-scan
        depends: clone-repo
      - name: tflint
        template: tflint
        depends: clone-repo
      - name: tfplan
        template: tfplan
        depends: tflint
      - name: tfplan-to-comment
        template: tfplan-to-comment
        depends: tfplan
```

<details>

<summary>View complete PR validation workflow</summary>

````yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: tf-pr-
  namespace: ci
spec:
  serviceAccountName: ci
  entrypoint: main
  synchronization:
    mutexes:
      - name: tf
  volumeClaimTemplates:
  - metadata:
      name: workdir
    spec:
      accessModes: [ "ReadWriteMany" ]
      storageClassName: nfs
      resources:
        requests:
          storage: 1Gi
  templates:
    - name: main
      dag:
        tasks:
          - name: clone-repo
            template: clone-repo
          - name: get-pr
            template: get-pr
          - name: checkov-scan
            template: checkov-scan
            arguments:
              parameters:
              - name: path
                value: "{{item}}"
            withItems: [
              terraform,
              terraform/region-1,
              terraform/region-2
            ]
            depends: clone-repo
          - name: tflint
            template: tflint
            arguments:
              parameters:
              - name: path
                value: "{{item}}"
            withItems: [
              terraform,
              terraform/region-1,
              terraform/region-2
            ]
            depends: clone-repo
          - name: tfplan
            template: tfplan
            arguments:
              parameters:
              - name: path
                value: "{{item}}"
            withItems: [
              terraform,
              terraform/region-1,
              terraform/region-2
            ]
            depends: tflint
          - name: tfplan-to-comment
            template: tfplan-to-comment
            arguments:
              parameters:
                - name: pr_num
                  value: "{{tasks.get-pr.outputs.parameters.pr_num}}"
                - name: index
                  value: "{{item}}"
            withItems: [
              terraform,
              terraform/region-1,
              terraform/region-2
            ]
            depends: (tfplan && get-pr)

    - name: clone-repo
      container:
        image: alpine
        command:
          - sh
          - -c
          - |
            apk --update add openssh-client git
            eval `ssh-agent -s`
            mkdir -p /workdir/src/github.com/<your-org>;
            cd /workdir/src/github.com/<your-org>;
            ssh-add /root/.ssh/ssh-deploy-key;
            ssh-keyscan github.com > /root/.ssh/known_hosts;
            git config --global --add safe.directory '*';
            git clone git@github.com:<your-org>/<your-repo>.git;
            cd <your-repo>;
            git checkout $GIT_COMMIT;
        volumeMounts:
        - name: workdir
          mountPath: /workdir

    - name: checkov-scan
      inputs:
        parameters:
          - name: path
      container:
        image: <your-registry>/terraform
        command:
          - bash
          - -c
          - |
            mkdir -p /checkov-scan
            cp -R /workdir/src/github.com/<your-org>/<your-repo> /checkov-scan
            cd /checkov-scan/<your-repo>/{{inputs.parameters.path}}/
            tofu init
            checkov --quiet --compact --directory . --repo-root-for-plan-enrichment .
        env:
          - name: AWS_ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: aws-access-key-id
          - name: AWS_SECRET_ACCESS_KEY
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: aws-secret-access-key
        volumeMounts:
        - name: workdir
          mountPath: /workdir

    - name: tflint
      inputs:
        parameters:
          - name: path
      container:
        image: <your-registry>/terraform
        command:
          - bash
          - -c
          - |
            mkdir -p /tflint-dir
            cp -R /workdir/src/github.com/<your-org>/<your-repo> /tflint-dir
            cd /tflint-dir/<your-repo>/{{inputs.parameters.path}}/
            tofu init
            tflint --init --no-color
            tflint --no-color
        env:
          - name: AWS_ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: aws-access-key-id
          - name: AWS_SECRET_ACCESS_KEY
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: aws-secret-access-key
        volumeMounts:
        - name: workdir
          mountPath: /workdir

    - name: tfplan
      inputs:
        parameters:
          - name: path
      container:
        image: <your-registry>/terraform
        command:
          - bash
          - -c
          - |
            mkdir -p /tfplan-dir
            mkdir -p /workdir/terraform
            cp -R /workdir/src/github.com/<your-org>/<your-repo> /tfplan-dir
            cd /tfplan-dir/<your-repo>/{{inputs.parameters.path}}/
            tofu init
            tofu plan -lock-timeout=600s -out=tfplan
            tofu show -json tfplan | tf-summarize > /workdir/{{inputs.parameters.path}}-plan.txt
            tofu show tfplan
            tf-summarize tfplan
        env:
          - name: AWS_ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: aws-access-key-id
          - name: AWS_SECRET_ACCESS_KEY
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: aws-secret-access-key
        volumeMounts:
        - name: workdir
          mountPath: /workdir

    - name: tfplan-to-comment
      inputs:
        parameters:
          - name: pr_num
          - name: index
      container:
        image: cloudposse/github-commenter:0.16.2
        env:
          - name: GITHUB_TOKEN
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: github-token
          - name: GITHUB_COMMENT_FORMAT
            value: |
              Tofu plan ({{inputs.parameters.index}}):
              ```
              {{.}}
              ```
        command:
          - /bin/sh
          - -c
          - |
            if [ "${isPR}" == "true" ]
            then
              cat /workdir/{{inputs.parameters.index}}-plan.txt | github-commenter
            fi
        volumeMounts:
        - name: workdir
          mountPath: /workdir
````

</details>

### Security Scanning

The workflow uses Checkov to scan infrastructure code for security and compliance issues:

```yaml
- name: checkov-scan
  container:
    image: <your-registry>/terraform
    command:
      - bash
      - -c
      - |
        cd /workdir/terraform/
        tofu init
        checkov --quiet --compact --directory . --repo-root-for-plan-enrichment .
```

### Terraform Planning

The workflow generates a plan showing what infrastructure changes would occur:

```yaml
- name: tfplan
  container:
    image: <your-registry>/terraform
    command:
      - bash
      - -c
      - |
        cd /workdir/terraform/
        tofu init
        tofu plan -lock-timeout=600s -out=tfplan
        tofu show -json tfplan | tf-summarize > /workdir/plan.txt
        tofu show tfplan
```

The plan output is captured and posted as a comment on the pull request, providing reviewers with clear visibility into the proposed changes.

## Example: Nightly Drift Detection

Infrastructure drift occurs when the actual state of your infrastructure diverges from what is defined in your code. A scheduled CronWorkflow can detect this drift by running `tofu plan` regularly and alerting when changes are detected.

### CronWorkflow Structure

The nightly drift detection workflow runs on a schedule and checks for infrastructure changes:

```yaml
spec:
  schedules:
    - "0 13 * * 1-5"
  timezone: "UTC"
  workflowSpec:
    entrypoint: main
    templates:
      - name: main
        dag:
          tasks:
            - name: clone-repo
              template: clone-repo
            - name: check-and-notify
              template: check-and-notify
              depends: clone-repo
```

<details>

<summary>View complete nightly drift detection workflow</summary>

```yaml
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: tf-nightly
  namespace: ci
  labels:
    cron: "true"
spec:
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  schedules:
    - "0 13 * * 1-5"
  timezone: "UTC"
  startingDeadlineSeconds: 0
  suspend: false
  workflowSpec:
    entrypoint: main
    synchronization:
      mutexes:
        - name: tf
    serviceAccountName: ci
    volumeClaimTemplates:
    - metadata:
        name: workdir
      spec:
        accessModes: [ "ReadWriteMany" ]
        storageClassName: nfs
        resources:
          requests:
            storage: 1Gi
    templates:
      - name: main
        dag:
          tasks:
            - name: clone-repo
              template: clone-repo
            - name: check-and-notify
              template: check-and-notify
              arguments:
                parameters:
                - name: path
                  value: "{{item}}"
              withItems: [
                terraform,
                terraform/region-1,
                terraform/region-2
              ]
              depends: clone-repo

      - name: check-and-notify
        inputs:
          parameters:
            - name: path
        dag:
          tasks:
            - name: tfplan
              template: tfplan
              arguments:
                parameters:
                - name: path
                  value: "{{inputs.parameters.path}}"
            - name: send-notification
              template: send-notification
              arguments:
                parameters:
                  - name: title
                    value: "Infrastructure drift detected"
                  - name: message
                    value: "Infrastructure has diverged from Terraform code in <your-repo>/{{inputs.parameters.path}}."
                  - name: exitcode
                    value: "{{tasks.tfplan.outputs.parameters.exitcode}}"
                  - name: path
                    value: "{{inputs.parameters.path}}"
              depends: tfplan

      - name: clone-repo
        container:
          image: alpine
          command:
            - sh
            - -c
            - |
              apk --update add openssh-client git
              eval `ssh-agent -s`
              mkdir -p /workdir/src/github.com/<your-org>;
              cd /workdir/src/github.com/<your-org>;
              ssh-add /root/.ssh/ssh-deploy-key;
              ssh-keyscan github.com > /root/.ssh/known_hosts;
              git config --global --add safe.directory '*';
              git clone git@github.com:<your-org>/<your-repo>.git;
              cd <your-repo>;
              git checkout $GIT_COMMIT;
          volumeMounts:
          - name: workdir
            mountPath: /workdir

      - name: tfplan
        inputs:
          parameters:
            - name: path
        container:
          image: <your-registry>/terraform
          command:
            - bash
            - -c
            - |
              mkdir -p /tfplan-dir
              cp -R /workdir/src/github.com/<your-org>/<your-repo> /tfplan-dir
              cd /tfplan-dir/<your-repo>/{{inputs.parameters.path}}/
              tofu init
              tofu plan -detailed-exitcode -lock-timeout=600s
              if [ $? -eq 2 ]; then
                  echo "Tofu apply needed"
                  echo "0" > /tmp/exitcode
                  exit 0
              else
                  echo "Infrastructure matches code"
                  echo "1" > /tmp/exitcode
                  exit 0
              fi
          env:
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: <your-secret>
                  key: aws-access-key-id
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: <your-secret>
                  key: aws-secret-access-key
          volumeMounts:
          - name: workdir
            mountPath: /workdir
        outputs:
          parameters:
          - name: exitcode
            valueFrom:
              path: /tmp/exitcode

      - name: send-notification
        inputs:
          parameters:
            - name: title
            - name: message
            - name: exitcode
            - name: path
        container:
          image: alpine
          command:
            - sh
            - -c
            - |
              if [ "{{inputs.parameters.exitcode}}" == "0" ]
              then
                A="{{inputs.parameters.title}}"
                B="{{inputs.parameters.message}}"
                AINPUT="${A}\n${B}"
                # Send notification via your preferred method (Slack, email, etc.)
                echo "$AINPUT"
                # Example: curl to Slack webhook, PagerDuty, etc.
                # curl -H "Content-type: application/json" -X POST -d "$data" ${WEBHOOK_URL}
              fi
          env:
            - name: WEBHOOK_URL
              valueFrom:
                secretKeyRef:
                  name: <your-secret>
                  key: webhook-url
```

</details>

### Drift Detection Logic

The workflow runs `tofu plan` with the `-detailed-exitcode` flag, which returns exit code 2 when changes are detected:

```yaml
- name: tfplan
  container:
    image: <your-registry>/terraform
    command:
      - bash
      - -c
      - |
        cd /workdir/terraform/
        tofu init
        tofu plan -detailed-exitcode -lock-timeout=600s
        if [ $? -eq 2 ]; then
            echo "Infrastructure drift detected"
            echo "0" > /tmp/exitcode
        else
            echo "Infrastructure matches code"
            echo "1" > /tmp/exitcode
        fi
```

When drift is detected, the workflow sends a notification to alert the team.

## Managing Workflows with Pipekit

Pipekit provides a control plane for managing these infrastructure workflows. Pipekit offers a [hosted SaaS control plane](/readme.md#pipekit-architecture) as the default option — the easiest and fastest way to get started. For organizations with specific compliance or infrastructure requirements, you can [self-host Pipekit](/self-hosting-pipekit.md) in your own environment.

### Viewing Workflow Runs

View all workflow runs in the Pipekit UI through [Pipes](/pipekit/pipes.md). Each [Pipe Run](/pipekit/pipes/pipe-runs.md) provides:

* [Run Graph (DAG)](/pipekit/pipes/pipe-runs/run-graph.md) showing task dependencies
* [Pod Logs](/pipekit/pipes/pipe-runs/pod-logs.md) for debugging
* [Workflow YAML](/pipekit/pipes/pipe-runs/workflow-yaml.md) for inspection

### Using the CLI

The [Pipekit CLI](/cli.md) allows you to interact with infrastructure workflows from your terminal. Use it to list runs, view logs, and manage workflow execution.

## Best Practices

### Use Mutual Exclusion

Infrastructure operations should not run concurrently on the same resources. Use Argo's `synchronization.mutexes` to ensure only one workflow modifies infrastructure at a time:

```yaml
spec:
  synchronization:
    mutexes:
      - name: tf
```

### Secure Credentials Management

Store sensitive credentials in Kubernetes Secrets and inject them into workflow pods as environment variables:

```yaml
env:
  - name: AWS_ACCESS_KEY_ID
    valueFrom:
      secretKeyRef:
        name: <your-secret>
        key: aws-access-key-id
  - name: AWS_SECRET_ACCESS_KEY
    valueFrom:
      secretKeyRef:
        name: <your-secret>
        key: aws-secret-access-key
```

For enhanced security, consider using external secrets management solutions like HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with appropriate Kubernetes integrations.

### Separate Plan from Apply

Never automatically run `terraform apply` or `tofu apply` in automated workflows. Always generate plans for review, then apply changes manually or with explicit human approval.

Consider creating a [Workflow Template](/pipekit/templates.md) for the apply step. This allows authorized users to manually submit the template to apply infrastructure changes without needing to configure their local environment with the correct credentials and tooling. The template can include all necessary credentials, container images, and configuration, ensuring consistent and secure infrastructure deployments.

For example, create a template that takes the plan output as input and applies it:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: terraform-apply
spec:
  entrypoint: apply
  templates:
    - name: apply
      inputs:
        parameters:
          - name: terraform-path
      container:
        image: <your-registry>/terraform
        command:
          - bash
          - -c
          - |
            cd /workdir/{{inputs.parameters.terraform-path}}/
            tofu apply -auto-approve
        env:
          - name: AWS_ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: aws-access-key-id
          - name: AWS_SECRET_ACCESS_KEY
            valueFrom:
              secretKeyRef:
                name: <your-secret>
                key: aws-secret-access-key
```

Users can then submit this template from Pipekit when they're ready to apply reviewed changes.

### Monitor for Drift

Run drift detection workflows on a regular schedule (e.g., nightly) to catch unexpected infrastructure changes early. Alert your team when drift is detected so they can investigate and remediate.

### Use Node Selectors

For resource-intensive operations like security scanning, specify node selectors to ensure workflows run on appropriate hardware:

```yaml
nodeSelector:
  workload-type: compute-intensive
```

## Related Resources

* [CLI documentation](/cli.md)
* [Pipes documentation](/pipekit/pipes.md)
* [Managing Secrets](/pipekit/pipes/managing-pipes/secrets.md)
* [Alerting configuration](/pipekit/pipes/managing-pipes/alerting.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pipekit.io/use-cases/infrastructure-management.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
