Authentication¶
This document describes the authentication architecture used by DQX Data Quality Manager.
Overview¶
DQX Data Quality Manager uses a dual authentication model to securely access Databricks resources:
| Operation | Authentication Method | Why |
|---|---|---|
| SQL queries (Unity Catalog) | User Token (OBO) | Access data with user's permissions |
| AI Analysis (ai_query) | User Token (OBO) | Execute queries as user |
| Job execution | App Service Principal | No "jobs" scope available for user auth |
| Lakebase storage | User OAuth Token | Store rules with user identity |
On-Behalf-Of (OBO) Authentication¶
What is OBO?¶
On-Behalf-Of (OBO) authentication allows the app to act on behalf of the logged-in user. When a user accesses the app through Databricks Apps, their OAuth token is forwarded to the application via the x-forwarded-access-token HTTP header.
sequenceDiagram
participant User
participant DatabricksApps as Databricks Apps
participant App as DQX App
participant Unity as Unity Catalog
User->>DatabricksApps: Login with Databricks credentials
DatabricksApps->>App: Forward request + x-forwarded-access-token
App->>Unity: SQL query with user's token
Unity->>Unity: Check user's permissions
Unity->>App: Return data (filtered by permissions)
App->>User: Display results
Benefits of OBO¶
- Security: Users only see data they have permission to access
- Audit Trail: All actions are logged under the user's identity
- No Elevated Privileges: App doesn't need broad data access permissions
- Compliance: Data access follows organizational policies
SQL Operations (OBO)¶
All Unity Catalog operations use the user's forwarded token:
How It Works¶
# src/app/services/databricks.py
def _get_user_token(self) -> Optional[str]:
"""Get user's access token from request headers."""
if has_request_context():
token = request.headers.get('x-forwarded-access-token')
return token
return None
def _get_sql_connection(self):
"""Get SQL connection using user's token (OBO)."""
user_token = self._get_user_token()
if user_token:
# OBO authentication using user's token
return sql.connect(
server_hostname=host,
http_path=http_path,
access_token=user_token # User's forwarded token
)
Operations Using OBO¶
| Operation | SQL Command | User Permission Required |
|---|---|---|
| List catalogs | SHOW CATALOGS |
USE CATALOG |
| List schemas | SHOW SCHEMAS IN catalog |
USE SCHEMA |
| List tables | SHOW TABLES IN catalog.schema |
USE SCHEMA |
| Sample data | SELECT * FROM table LIMIT n |
SELECT on table |
| AI analysis | SELECT ai_query(...) |
Model Serving access |
Configuration¶
To enable SQL OBO authentication, configure these in your app:
# resources/apps.yml
resources:
apps:
dqx_app:
# Request SQL scope for OBO
user_api_scopes:
- sql
# Grant access to SQL warehouse
resources:
- name: "dqx-sql-warehouse"
sql_warehouse:
id: ${var.sql_warehouse_id}
permission: "CAN_USE"
Job Operations (Service Principal)¶
Job operations use the App Service Principal instead of user tokens because there's no jobs scope available for user authorization in Databricks Apps.
How It Works¶
# src/app/services/databricks.py
def trigger_dq_job(self, table_name: str, user_prompt: str) -> Dict:
"""Trigger job using app SP credentials (not user token)."""
# use_user_token=False means use app's service principal
client = self._get_client(use_user_token=False)
response = client.jobs.run_now(
job_id=int(Config.DQ_GENERATION_JOB_ID),
job_parameters={...}
)
return {"run_id": response.run_id}
Permission Setup¶
The app's service principal needs CAN_MANAGE_RUN permission on jobs:
# resources/apps.yml
resources:
apps:
dqx_app:
resources:
# Grant app SP permission to run generation job
- name: "dqx-generation-job"
job:
id: ${resources.jobs.dq_rule_generation.id}
permission: "CAN_MANAGE_RUN"
# Grant app SP permission to run validation job
- name: "dqx-validation-job"
job:
id: ${resources.jobs.dq_rule_validation.id}
permission: "CAN_MANAGE_RUN"
Lakebase Authentication (OAuth)¶
Lakebase (Databricks PostgreSQL) uses OAuth authentication with the user's forwarded token.
How It Works¶
# src/app/services/lakebase.py
def get_connection():
"""Get PostgreSQL connection using OAuth."""
user_email, user_token = get_user_oauth_credentials()
conn = psycopg2.connect(
host=Config.LAKEBASE_HOST,
database=Config.LAKEBASE_DATABASE,
user=user_email, # User's email from token
password=user_token, # OAuth token as password
port=5432,
sslmode='require'
)
return conn
User Identification¶
The user's email is extracted from:
1. x-forwarded-email header (if available), or
2. Databricks API call using the token to get current user
def get_user_oauth_credentials() -> Tuple[str, str]:
user_token = request.headers.get('x-forwarded-access-token')
user_email = request.headers.get('x-forwarded-email')
if not user_email:
# Fallback: query Databricks for user info
ws = WorkspaceClient(token=user_token)
current_user = ws.current_user.me()
user_email = current_user.user_name
return user_email, user_token
AI Analysis Authentication¶
AI analysis uses the Statement Execution API with the user's token:
# src/app/services/ai.py
def analyze_rules(rules, table_name, user_prompt):
# Get WorkspaceClient with user's token
ws = databricks_service.client # Uses OBO
# Execute ai_query via Statement Execution API
response = ws.statement_execution.execute_statement(
warehouse_id=warehouse_id,
statement=f"SELECT ai_query('{model}', '{prompt}')",
wait_timeout="0s"
)
Authentication Flow Diagram¶
flowchart TB
subgraph User["User Browser"]
A[User Login]
end
subgraph DatabricksApps["Databricks Apps Platform"]
B[OAuth Authentication]
C[Token Forwarding]
end
subgraph App["DQX App"]
D[Flask Application]
E{Operation Type?}
F[SQL Service<br/>OBO Auth]
G[Job Service<br/>SP Auth]
H[Lakebase Service<br/>OAuth]
I[AI Service<br/>OBO Auth]
end
subgraph Resources["Databricks Resources"]
J[(Unity Catalog)]
K[Serverless Jobs]
L[(Lakebase)]
M[Model Serving]
end
A --> B
B --> C
C -->|x-forwarded-access-token| D
D --> E
E -->|SQL Query| F
E -->|Trigger Job| G
E -->|Save Rules| H
E -->|AI Analysis| I
F -->|User Token| J
G -->|SP Credentials| K
H -->|User OAuth| L
I -->|User Token| M
Local Development¶
For local development without Databricks Apps, use a personal access token:
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"
export DQ_GENERATION_JOB_ID="your-job-id"
The app will fall back to the configured token when no x-forwarded-access-token header is present:
if user_token:
# OBO authentication (production)
return sql.connect(..., access_token=user_token)
elif Config.DATABRICKS_TOKEN:
# Fallback to configured token (local dev)
return sql.connect(..., access_token=Config.DATABRICKS_TOKEN)
Security Best Practices¶
Security Recommendations
- Never log tokens: Avoid logging access tokens in production
- Use HTTPS: All connections use TLS encryption
- Minimal scopes: Request only necessary API scopes (
sql) - Audit logging: All operations are logged with user identity
- Token expiration: OAuth tokens have limited lifetime
Troubleshooting¶
Common Issues¶
| Issue | Cause | Solution |
|---|---|---|
| "No catalogs available" | Missing sql scope or warehouse permission |
Check user_api_scopes includes sql |
| "Job failed to start" | SP missing job permissions | Verify CAN_MANAGE_RUN in apps.yml |
| "Lakebase connection failed" | Invalid OAuth token | Ensure user is logged in via Databricks Apps |
| "AI analysis unavailable" | Token doesn't have model access | Grant user access to model serving endpoint |
Debug Mode¶
Enable debug logging to see authentication details:
# In databricks.py
print(f"[DEBUG] User token found (length: {len(token)})")
print(f"[DEBUG] Creating SQL connection with OBO auth, host={host}")
Check app logs in Databricks Console → Compute → Apps → Logs.