Skip to content

CI/CD Pipeline

This project uses GitHub Actions with Databricks Asset Bundles (DABs) for automated deployment.

Pipeline Overview

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   PR/Push   │────▶│   Validate  │────▶│   Deploy    │
│   to main   │     │   & Lint    │     │   to Dev    │
└─────────────┘     └─────────────┘     └─────────────┘
                    ┌─────────────┐           │
                    │   Manual    │◀──────────┘
                    │   Trigger   │
                    └──────┬──────┘
              ┌────────────┴────────────┐
              ▼                         ▼
       ┌─────────────┐          ┌─────────────┐
       │   Deploy    │          │   Deploy    │
       │   to Stage  │          │   to Prod   │
       └─────────────┘          └─────────────┘

Environments

Environment Trigger Purpose
dev Push to main, PR Development testing
stage Manual Pre-production validation
prod Manual Production deployment

GitHub Secrets Required

Configure these secrets in your GitHub repository for each environment:

Secret Description Required Default
DATABRICKS_HOST Databricks workspace URL (e.g., https://dbc-xxx.cloud.databricks.com) Yes -
DATABRICKS_CLIENT_ID Service Principal Application ID Yes -
LAKEBASE_HOST Lakebase PostgreSQL host for DQ rule storage No -
LAKEBASE_DATABASE Lakebase database name No databricks_postgres
MODEL_SERVING_ENDPOINT Model serving endpoint for AI analysis No databricks-claude-sonnet-4-5

Prerequisites: GitHub OIDC Federation

Before CI/CD will work, configure workload identity federation in Databricks:

1. Create a Service Principal

In Databricks Account Console, create a service principal.

2. Create Federation Policy

databricks account service-principal-federation-policy create <SP_ID> --json '{
  "oidc_policy": {
    "issuer": "https://token.actions.githubusercontent.com",
    "audiences": ["<DATABRICKS_ACCOUNT_ID>"],
    "subject": "repo:<GITHUB_ORG>/<REPO_NAME>:environment:<ENV>"
  }
}'

3. Grant Workspace Access

Grant the service principal access to your workspace.

See Enable workload identity federation for GitHub Actions for details.

Manual Deployment

Deploy using Databricks CLI:

# Install Databricks CLI
pip install databricks-cli

# Set workspace
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"

# Validate bundle
databricks bundle validate -t dev

# Deploy to environment
databricks bundle deploy -t dev    # Development
databricks bundle deploy -t stage  # Staging
databricks bundle deploy -t prod   # Production

Bundle Configuration

The bundle uses a modular structure with Serverless compute:

databricks.yml                    # Main config (includes other files)
resources/
├── apps.yml                      # Databricks App definition
├── generation_job.yml            # DQ rule generation job (Serverless)
└── validation_job.yml            # DQ rule validation job (Serverless)
environments/
├── dev/
│   ├── targets.yml               # Dev target (mode: production)
│   ├── variables.yml             # Dev variables
│   └── permissions.yml           # Dev permissions
├── stage/
│   ├── targets.yml               # Stage target (mode: production)
│   ├── variables.yml             # Stage variables
│   └── permissions.yml           # Stage permissions
└── prod/
    ├── targets.yml               # Prod target (mode: production)
    ├── variables.yml             # Prod variables
    └── permissions.yml           # Prod permissions

Environment Differences

Setting Development Staging Production
App Name dqx-rule-generator-dev dqx-rule-generator-stage dqx-rule-generator
Job Name DQ Rule Generation - Dev DQ Rule Generation - Stage DQ Rule Generation
Compute Serverless Serverless Serverless