Configuration¶
This document describes all configuration options for DQX Data Quality Manager.
Overview¶
Configuration is managed at three levels:
| Level | Location | Purpose |
|---|---|---|
| App Runtime | src/app.yaml |
Environment variables for the Flask app |
| Bundle Variables | environments/<env>/variables.yml |
DAB deployment variables |
| Workspace | Environment variable | Databricks host URL |
App Runtime Configuration¶
src/app.yaml¶
This file configures the Databricks App runtime environment:
command:
- gunicorn
- --bind
- 0.0.0.0:8000
- --workers
- "2"
- --timeout
- "300"
- wsgi:app
env:
# === Required ===
- name: DQ_GENERATION_JOB_ID
value: "<generation-job-id>"
- name: DQ_VALIDATION_JOB_ID
value: "<validation-job-id>"
- name: SQL_WAREHOUSE_ID
value: "<sql-warehouse-id>"
# === Optional: Lakebase ===
- name: LAKEBASE_HOST
value: "<lakebase-host>.database.us-east-1.cloud.databricks.com"
- name: LAKEBASE_DATABASE
value: "databricks_postgres"
- name: LAKEBASE_PORT
value: "5432"
# === Optional: AI Analysis ===
- name: MODEL_SERVING_ENDPOINT
value: "databricks-claude-sonnet-4-5"
# === Optional: Data Sampling ===
- name: SAMPLE_DATA_LIMIT
value: "100"
Environment Variables Reference¶
Required Variables¶
| Variable | Description | Example |
|---|---|---|
DQ_GENERATION_JOB_ID |
Databricks Job ID for rule generation | 123456789 |
DQ_VALIDATION_JOB_ID |
Databricks Job ID for rule validation | 987654321 |
SQL_WAREHOUSE_ID |
SQL Warehouse ID for queries | abc123def456 |
Getting Job IDs
After deploying the bundle, get job IDs with:
Lakebase Variables (Optional)¶
| Variable | Description | Default |
|---|---|---|
LAKEBASE_HOST |
Lakebase PostgreSQL hostname | - |
LAKEBASE_DATABASE |
Database name | databricks_postgres |
LAKEBASE_PORT |
PostgreSQL port | 5432 |
Lakebase Authentication
Lakebase uses OAuth authentication. The user's forwarded token is used as the password. See Authentication.
AI Analysis Variables (Optional)¶
| Variable | Description | Default |
|---|---|---|
MODEL_SERVING_ENDPOINT |
Model serving endpoint for AI analysis | databricks-claude-sonnet-4-5 |
Data Sampling Variables (Optional)¶
| Variable | Description | Default |
|---|---|---|
SAMPLE_DATA_LIMIT |
Maximum rows to display in sample preview | 100 |
Bundle Variables¶
databricks.yml (Declaration)¶
Variables are declared in the main bundle configuration:
# databricks.yml
variables:
app_name:
description: "Name of the Databricks App"
app_description:
description: "Description of the Databricks App"
job_name:
description: "Name of the DQ rule generation job"
notebook_path:
description: "Path to the DQ rule generation notebook"
validation_job_name:
description: "Name of the DQ rule validation job"
validation_notebook_path:
description: "Path to the DQ rule validation notebook"
sql_warehouse_id:
description: "SQL Warehouse ID for app to execute queries"
default: "" # Overridden by CI/CD --var flag
Environment Variables Files¶
Values are set per environment in environments/<env>/variables.yml:
# environments/dev/variables.yml
targets:
dev:
variables:
app_name: "dqx-rule-generator-dev"
app_description: "DQX Data Quality Manager - Dev Environment"
job_name: "DQ Rule Generation - Dev"
# Notebooks deployed with bundle to workspace.root_path
notebook_path: "${workspace.root_path}/notebooks/generate_dq_rules_fast"
validation_job_name: "DQ Rule Validation - Dev"
validation_notebook_path: "${workspace.root_path}/notebooks/validate_dq_rules"
# environments/stage/variables.yml
targets:
stage:
variables:
app_name: "dqx-rule-generator-stage"
app_description: "DQX Data Quality Manager - Stage Environment"
job_name: "DQ Rule Generation - Stage"
notebook_path: "${workspace.root_path}/notebooks/generate_dq_rules_fast"
validation_job_name: "DQ Rule Validation - Stage"
validation_notebook_path: "${workspace.root_path}/notebooks/validate_dq_rules"
# environments/prod/variables.yml
targets:
prod:
variables:
app_name: "dqx-rule-generator"
app_description: "DQX Data Quality Manager - Production"
job_name: "DQ Rule Generation"
notebook_path: "${workspace.root_path}/notebooks/generate_dq_rules_fast"
validation_job_name: "DQ Rule Validation"
validation_notebook_path: "${workspace.root_path}/notebooks/validate_dq_rules"
Bundle Variables Reference¶
| Variable | Description | Example |
|---|---|---|
app_name |
Databricks App name | dqx-rule-generator-dev |
app_description |
App description shown in UI | DQX Data Quality Manager |
job_name |
Generation job name | DQ Rule Generation - Dev |
notebook_path |
Path to generation notebook | ${workspace.root_path}/notebooks/... |
validation_job_name |
Validation job name | DQ Rule Validation - Dev |
validation_notebook_path |
Path to validation notebook | ${workspace.root_path}/notebooks/... |
sql_warehouse_id |
SQL Warehouse ID | Passed via --var in CI/CD |
Notebook Path
Use ${workspace.root_path}/notebooks/... to reference notebooks deployed with the bundle. This ensures the service principal running jobs can access the notebooks.
Workspace Configuration¶
Host URL¶
Set the Databricks workspace URL via environment variable:
This is used for: - Local CLI deployment - CI/CD workflows (from GitHub secrets) - SDK initialization fallback
Target Configuration¶
Each environment has target settings in environments/<env>/targets.yml:
# environments/dev/targets.yml
targets:
dev:
mode: production
default: true
workspace:
# Bundle files deployed to this path
root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}/${bundle.target}
CI/CD Secrets¶
For automated deployments via GitHub Actions, configure these secrets per environment:
Required Secrets¶
| Secret | Description |
|---|---|
DATABRICKS_HOST |
Workspace URL |
DATABRICKS_CLIENT_ID |
Service Principal Application ID |
SQL_WAREHOUSE_ID |
SQL Warehouse ID |
Optional Secrets¶
| Secret | Description | Default |
|---|---|---|
LAKEBASE_HOST |
Lakebase PostgreSQL host | - |
LAKEBASE_DATABASE |
Lakebase database name | databricks_postgres |
MODEL_SERVING_ENDPOINT |
Model serving endpoint | databricks-claude-sonnet-4-5 |
SAMPLE_DATA_LIMIT |
Sample data row limit | 100 |
Setting Up Secrets¶
- Go to GitHub repository → Settings → Secrets and variables → Actions
- Click "New repository secret" for each secret
- For environment-specific secrets, create GitHub environments first
# Example: Setting secrets via GitHub CLI
gh secret set DATABRICKS_HOST --env dev --body "https://your-workspace.cloud.databricks.com"
gh secret set SQL_WAREHOUSE_ID --env dev --body "abc123def456"
Local Development¶
For local development without Databricks Apps:
# Required
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"
export DQ_GENERATION_JOB_ID="your-generation-job-id"
export DQ_VALIDATION_JOB_ID="your-validation-job-id"
export SQL_WAREHOUSE_ID="your-warehouse-id"
# Optional
export LAKEBASE_HOST="your-lakebase-host"
export MODEL_SERVING_ENDPOINT="databricks-claude-sonnet-4-5"
# Run the app
cd src
python wsgi.py
Token Fallback
When running locally, the app uses DATABRICKS_TOKEN as fallback when no x-forwarded-access-token header is present.
Configuration Hierarchy¶
Configuration values are resolved in this order:
- Environment variables in
src/app.yaml(highest priority) - Bundle variables from
environments/<env>/variables.yml - Default values in
databricks.yml - Runtime fallbacks in
src/app/config.py
# src/app/config.py
class Config:
DATABRICKS_HOST = os.getenv("DATABRICKS_HOST")
DATABRICKS_TOKEN = os.getenv("DATABRICKS_TOKEN")
DQ_GENERATION_JOB_ID = os.getenv("DQ_GENERATION_JOB_ID")
DQ_VALIDATION_JOB_ID = os.getenv("DQ_VALIDATION_JOB_ID")
SQL_WAREHOUSE_ID = os.getenv("SQL_WAREHOUSE_ID")
LAKEBASE_HOST = os.getenv("LAKEBASE_HOST")
LAKEBASE_DATABASE = os.getenv("LAKEBASE_DATABASE", "databricks_postgres")
LAKEBASE_PORT = int(os.getenv("LAKEBASE_PORT", "5432"))
MODEL_SERVING_ENDPOINT = os.getenv("MODEL_SERVING_ENDPOINT", "databricks-claude-sonnet-4-5")
SAMPLE_DATA_LIMIT = int(os.getenv("SAMPLE_DATA_LIMIT", "100"))
Validation¶
Validate Bundle Configuration¶
# Validate bundle configuration
databricks bundle validate -t dev
# Check resolved variables
databricks bundle validate -t dev --output json | jq '.variables'
Check App Configuration¶
After deployment, verify app environment:
Or check via the app's debug endpoint:
Related Documentation¶
- Quick Start - Deployment guide with configuration steps
- Authentication - Auth configuration details
- CI/CD Pipeline - GitHub secrets setup