DQX Data Quality Manager¶
A Databricks App for generating and validating data quality rules using AI assistance with Databricks DQX.
What is DQX Data Quality Manager?¶
DQX Data Quality Manager provides an intuitive web interface for:
- Generating data quality rules using AI and natural language prompts
- Validating rules against your actual data with pass/fail statistics
- Storing rules with version control in Lakebase (PostgreSQL)
- Analyzing rule coverage and quality with AI-powered insights
Built on the Databricks platform, it leverages Unity Catalog for data access, serverless compute for rule generation/validation, and Lakebase for persistent storage.
Key Features¶
-
AI-Powered Generation
Generate comprehensive DQX-compatible data quality rules using natural language prompts. Simply describe what you want to check, and AI creates the rules.
-
Rule Validation
Validate generated rules against your actual data using serverless Databricks jobs. Get detailed pass/fail statistics for each rule.
-
Version Control
Store and track rule versions in Lakebase (PostgreSQL) with full audit history. Roll back to previous versions when needed.
-
AI Analysis
Get AI-powered insights on rule coverage, quality scores, and recommendations for improving your data quality checks.
How It Works¶
- Select Data - Browse Unity Catalog and select your target table
- Generate Rules - Describe your requirements in natural language
- Review & Edit - Review AI-generated rules, edit as needed, validate against data
- Save - Get AI analysis and save rules with version control
Architecture¶
Authentication Model¶
DQX uses a dual authentication model for security:
| Component | Auth Method | Description |
|---|---|---|
| Unity Catalog | User Token (OBO) | Access data with user's permissions |
| AI Analysis | User Token (OBO) | Execute queries as the user |
| Jobs | App Service Principal | Trigger generation/validation jobs |
| Lakebase | User OAuth | Store rules with user identity |
On-Behalf-Of (OBO)
The app acts on behalf of the logged-in user for data access, ensuring users only see data they have permission to access. See Authentication for details.
Quick Start¶
Prerequisites¶
| Requirement | Description |
|---|---|
| Databricks CLI | Install here |
| AWS Databricks Workspace | With Unity Catalog enabled |
| SQL Warehouse | Any warehouse (Serverless recommended) |
Deploy in 5 Minutes¶
# 1. Clone the repository
git clone https://github.com/dediggibyte/databricks_dqx_agent.git
cd databricks_dqx_agent
# 2. Configure your workspace
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
# 3. Deploy the bundle
databricks bundle validate -t dev
databricks bundle deploy -t dev
Access your app at: https://your-workspace.cloud.databricks.com/apps/dqx-rule-generator-dev
Start using DQX Data Quality Manager
Technology Stack¶
| Component | Technology |
|---|---|
| Web Framework | Flask 3.0 with Gunicorn |
| Data Quality | Databricks Labs DQX |
| Compute | Databricks Serverless Jobs |
| Data Catalog | Unity Catalog |
| Storage | Lakebase (PostgreSQL) |
| AI | Claude Sonnet via Model Serving |
| Deployment | Databricks Asset Bundles (DAB) |
| CI/CD | GitHub Actions with OIDC |
Documentation¶
| Document | Description |
|---|---|
| Quick Start | Complete deployment guide |
| Configuration | Environment variables and settings |
| Authentication | OBO and security details |
| Architecture | System design and structure |
| API Reference | REST API endpoints |
| DQX Checks | Available check functions |
| CI/CD Pipeline | GitHub Actions setup |
Resources¶
License¶
This project is licensed under the MIT License - see the LICENSE file for details.