Skip to content

DQX Data Quality Manager

A Databricks App for generating and validating data quality rules using AI assistance with Databricks DQX.


What is DQX Data Quality Manager?

DQX Data Quality Manager provides an intuitive web interface for:

  • Generating data quality rules using AI and natural language prompts
  • Validating rules against your actual data with pass/fail statistics
  • Storing rules with version control in Lakebase (PostgreSQL)
  • Analyzing rule coverage and quality with AI-powered insights

Built on the Databricks platform, it leverages Unity Catalog for data access, serverless compute for rule generation/validation, and Lakebase for persistent storage.


Key Features

  • AI-Powered Generation


    Generate comprehensive DQX-compatible data quality rules using natural language prompts. Simply describe what you want to check, and AI creates the rules.

  • Rule Validation


    Validate generated rules against your actual data using serverless Databricks jobs. Get detailed pass/fail statistics for each rule.

  • Version Control


    Store and track rule versions in Lakebase (PostgreSQL) with full audit history. Roll back to previous versions when needed.

  • AI Analysis


    Get AI-powered insights on rule coverage, quality scores, and recommendations for improving your data quality checks.


How It Works

  1. Select Data - Browse Unity Catalog and select your target table
  2. Generate Rules - Describe your requirements in natural language
  3. Review & Edit - Review AI-generated rules, edit as needed, validate against data
  4. Save - Get AI analysis and save rules with version control

Architecture

DQX Architecture
DQX Data Quality Manager architecture showing Flask app with Unity Catalog, Serverless Jobs, and Lakebase

Authentication Model

DQX uses a dual authentication model for security:

Component Auth Method Description
Unity Catalog User Token (OBO) Access data with user's permissions
AI Analysis User Token (OBO) Execute queries as the user
Jobs App Service Principal Trigger generation/validation jobs
Lakebase User OAuth Store rules with user identity

On-Behalf-Of (OBO)

The app acts on behalf of the logged-in user for data access, ensuring users only see data they have permission to access. See Authentication for details.


Quick Start

Prerequisites

Requirement Description
Databricks CLI Install here
AWS Databricks Workspace With Unity Catalog enabled
SQL Warehouse Any warehouse (Serverless recommended)

Deploy in 5 Minutes

# 1. Clone the repository
git clone https://github.com/dediggibyte/databricks_dqx_agent.git
cd databricks_dqx_agent

# 2. Configure your workspace
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"

# 3. Deploy the bundle
databricks bundle validate -t dev
databricks bundle deploy -t dev

Access your app at: https://your-workspace.cloud.databricks.com/apps/dqx-rule-generator-dev

Start using DQX Data Quality Manager


Technology Stack

Component Technology
Web Framework Flask 3.0 with Gunicorn
Data Quality Databricks Labs DQX
Compute Databricks Serverless Jobs
Data Catalog Unity Catalog
Storage Lakebase (PostgreSQL)
AI Claude Sonnet via Model Serving
Deployment Databricks Asset Bundles (DAB)
CI/CD GitHub Actions with OIDC

Documentation

Document Description
Quick Start Complete deployment guide
Configuration Environment variables and settings
Authentication OBO and security details
Architecture System design and structure
API Reference REST API endpoints
DQX Checks Available check functions
CI/CD Pipeline GitHub Actions setup

Resources


License

This project is licensed under the MIT License - see the LICENSE file for details.