CI/CD Pipeline¶

This project uses GitHub Actions with Databricks Asset Bundles (DABs) for automated deployment.

Pipeline Overview¶

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   PR/Push   │────▶│   Validate  │────▶│   Deploy    │
│   to main   │     │   & Lint    │     │   to Dev    │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                    ┌─────────────┐           │
                    │   Manual    │◀──────────┘
                    │   Trigger   │
                    └──────┬──────┘
                           │
              ┌────────────┴────────────┐
              ▼                         ▼
       ┌─────────────┐          ┌─────────────┐
       │   Deploy    │          │   Deploy    │
       │   to Stage  │          │   to Prod   │
       └─────────────┘          └─────────────┘

Environments¶

Environment	Trigger	Purpose
`dev`	Push to `main`, PR	Development testing
`stage`	Manual	Pre-production validation
`prod`	Manual	Production deployment

GitHub Secrets Required¶

Configure these secrets in your GitHub repository for each environment:

Secret	Description	Required	Default
`DATABRICKS_HOST`	Databricks workspace URL (e.g., `https://dbc-xxx.cloud.databricks.com`)	Yes	-
`DATABRICKS_CLIENT_ID`	Service Principal Application ID	Yes	-
`LAKEBASE_HOST`	Lakebase PostgreSQL host for DQ rule storage	No	-
`LAKEBASE_DATABASE`	Lakebase database name	No	`databricks_postgres`
`MODEL_SERVING_ENDPOINT`	Model serving endpoint for AI analysis	No	`databricks-claude-sonnet-4-5`

Prerequisites: GitHub OIDC Federation¶

Before CI/CD will work, configure workload identity federation in Databricks:

1. Create a Service Principal¶

In Databricks Account Console, create a service principal.

2. Create Federation Policy¶

databricks account service-principal-federation-policy create <SP_ID> --json '{
  "oidc_policy": {
    "issuer": "https://token.actions.githubusercontent.com",
    "audiences": ["<DATABRICKS_ACCOUNT_ID>"],
    "subject": "repo:<GITHUB_ORG>/<REPO_NAME>:environment:<ENV>"
  }
}'

3. Grant Workspace Access¶

Grant the service principal access to your workspace.

See Enable workload identity federation for GitHub Actions for details.

Manual Deployment¶

Deploy using Databricks CLI:

# Install Databricks CLI
pip install databricks-cli

# Set workspace
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"

# Validate bundle
databricks bundle validate -t dev

# Deploy to environment
databricks bundle deploy -t dev    # Development
databricks bundle deploy -t stage  # Staging
databricks bundle deploy -t prod   # Production

Bundle Configuration¶

The bundle uses a modular structure with Serverless compute:

databricks.yml                    # Main config (includes other files)
resources/
├── apps.yml                      # Databricks App definition
├── generation_job.yml            # DQ rule generation job (Serverless)
└── validation_job.yml            # DQ rule validation job (Serverless)
environments/
├── dev/
│   ├── targets.yml               # Dev target (mode: production)
│   ├── variables.yml             # Dev variables
│   └── permissions.yml           # Dev permissions
├── stage/
│   ├── targets.yml               # Stage target (mode: production)
│   ├── variables.yml             # Stage variables
│   └── permissions.yml           # Stage permissions
└── prod/
    ├── targets.yml               # Prod target (mode: production)
    ├── variables.yml             # Prod variables
    └── permissions.yml           # Prod permissions

Environment Differences¶

Setting	Development	Staging	Production
App Name	dqx-rule-generator-dev	dqx-rule-generator-stage	dqx-rule-generator
Job Name	DQ Rule Generation - Dev	DQ Rule Generation - Stage	DQ Rule Generation
Compute	Serverless	Serverless	Serverless