Data Quality Engineering Framework
End-to-end operating framework for data quality in modern DE teams — dimensions, rule taxonomy, execution layers, ownership, escalation paths, and KPI scorecards.
Data QualityIntermediateFramework Document
Code preview
415 linesReplace {{PLACEHOLDERS}} with your environment values, then deploy to your stack.
# Data Quality Engineering Framework
**Version:** {{FRAMEWORK_VERSION}}
**Owner:** {{DATA_ENGINEERING_TEAM}}
**Last Updated:** {{LAST_UPDATED_DATE}}
**Applies To:** {{ORGANIZATION_NAME}} - all production data pipelines and data products
---
## Executive Summary
This framework provides an end-to-end **Data Quality (DQ)** operating model for data engineering teams. It defines quality dimensions, a rule taxonomy, execution layers (dbt, Great Expectations, SQL), ownership, escalation paths, metrics, and operating rhythms so quality is **designed in**, not bolted on after incidents.
**Goals:**
- Prevent bad data from reaching gold/consumption layers
- Make quality expectations explicit, testable, and versioned
- Assign clear ownership for detection, triage, and remediation
- Measure and improve quality systematically over time
---
## 1. Scope and Definitions
### 1.1 In Scope
- Batch and micro-batch pipelines (dbt, Spark, Glue, SQL)
- Streaming pipelines with quality checkpoints (see companion streaming framework)
- Source-to-gold path for curated data products
- Third-party and SaaS ingest with SLA-backed validation
### 1.2 Key Definitions
| Term | Definition |
|------|------------|
| **Data Quality Rule** | Executable assertion tied to a dataset column, row set, or pipeline stage |
| **Rule Severity** | Critical / High / Medium / Low - determines blocking behavior |
| **Quality Gate** | Pipeline stage that halts promotion on rule failure |
| **Data Owner** | Accountable party for business correctness of a dataset |
| **Pipeline Owner** | Engineering team responsible for implementation and monitoring |
| **Incident** | Sustained or critical failure affecting downstream consumers |
### 1.3 Quality Policy Statement
> No dataset advances from {{SOURCE_LAYER}} to {{TARGET_LAYER}} without passing all **Critical** and **High** severity rules defined in its contract. Exceptions require documented waiver from the Data Owner and {{GOVERNANCE_TEAM}}.
---
## 2. Data Quality Dimensions
Align rules to standard dimensions (extensible):
| Dimension | Question Answered | Example Rules |
|-----------|-------------------|---------------|
| **Completeness** | Is required data present? | `NOT NULL`, row count vs baseline, null rate threshold |
| **Uniqueness** | Are keys and grains correct? | Primary key uniqueness, duplicate detection |
| **Validity** | Do values conform to format/range? | Regex, enum, min/max, referential checks |
| **Accuracy** | Does data reflect reality? | Reconciliation to source system totals |
| **Consistency** | Is data aligned across systems? | Cross-table balance, dimension conformance |
| **Timeliness** | Is data fresh enough? | `max(updated_at)` SLA, landing latency |
| **Integrity** | Are relationships preserved? | FK existence, orphan detection |
Each data product contract must map at least one rule per applicable dimension.
---
## 3. Rule Taxonomy
### 3.1 Rule Categories
```
RULE TAXONOMY
├── SCHEMA RULES → structure, types, required columns
├── CONTENT RULES → value-level assertions
├── VOLUME RULES → row count, byte size, partition coverage
├── FRESHNESS RULES → latency, watermark age
├── RECONCILIATION RULES → source vs target aggregates
└── ANOMALY RULES → statistical deviation from baseline
```
// ... download full template for remaining codeHow to use this framework
End-to-end operating framework for data quality in modern DE teams — dimensions, rule taxonomy, execution layers, ownership, escalation paths, and KPI scorecards.
- Download the full document and review with your platform/architecture team
- Replace organization-specific placeholders (team names, AWS accounts, domains)
- Map each section to your current-state vs target-state gap analysis
- Use as an RFC or architecture decision record (ADR) starting point
data qualitygovernancedbtgreat expectationsoperating model
Downloads98
UpdatedJul 2, 2026