Data Quality Engineering Framework

End-to-end operating framework for data quality in modern DE teams — dimensions, rule taxonomy, execution layers, ownership, escalation paths, and KPI scorecards.

Data QualityIntermediateFramework Document

Code preview

415 lines

Replace {{PLACEHOLDERS}} with your environment values, then deploy to your stack.

# Data Quality Engineering Framework

**Version:** {{FRAMEWORK_VERSION}}  
**Owner:** {{DATA_ENGINEERING_TEAM}}  
**Last Updated:** {{LAST_UPDATED_DATE}}  
**Applies To:** {{ORGANIZATION_NAME}} - all production data pipelines and data products

---

## Executive Summary

This framework provides an end-to-end **Data Quality (DQ)** operating model for data engineering teams. It defines quality dimensions, a rule taxonomy, execution layers (dbt, Great Expectations, SQL), ownership, escalation paths, metrics, and operating rhythms so quality is **designed in**, not bolted on after incidents.

**Goals:**

- Prevent bad data from reaching gold/consumption layers
- Make quality expectations explicit, testable, and versioned
- Assign clear ownership for detection, triage, and remediation
- Measure and improve quality systematically over time

---

## 1. Scope and Definitions

### 1.1 In Scope

- Batch and micro-batch pipelines (dbt, Spark, Glue, SQL)
- Streaming pipelines with quality checkpoints (see companion streaming framework)
- Source-to-gold path for curated data products
- Third-party and SaaS ingest with SLA-backed validation

### 1.2 Key Definitions

| Term | Definition |
|------|------------|
| **Data Quality Rule** | Executable assertion tied to a dataset column, row set, or pipeline stage |
| **Rule Severity** | Critical / High / Medium / Low - determines blocking behavior |
| **Quality Gate** | Pipeline stage that halts promotion on rule failure |
| **Data Owner** | Accountable party for business correctness of a dataset |
| **Pipeline Owner** | Engineering team responsible for implementation and monitoring |
| **Incident** | Sustained or critical failure affecting downstream consumers |

### 1.3 Quality Policy Statement

> No dataset advances from {{SOURCE_LAYER}} to {{TARGET_LAYER}} without passing all **Critical** and **High** severity rules defined in its contract. Exceptions require documented waiver from the Data Owner and {{GOVERNANCE_TEAM}}.

---

## 2. Data Quality Dimensions

Align rules to standard dimensions (extensible):

| Dimension | Question Answered | Example Rules |
|-----------|-------------------|---------------|
| **Completeness** | Is required data present? | `NOT NULL`, row count vs baseline, null rate threshold |
| **Uniqueness** | Are keys and grains correct? | Primary key uniqueness, duplicate detection |
| **Validity** | Do values conform to format/range? | Regex, enum, min/max, referential checks |
| **Accuracy** | Does data reflect reality? | Reconciliation to source system totals |
| **Consistency** | Is data aligned across systems? | Cross-table balance, dimension conformance |
| **Timeliness** | Is data fresh enough? | `max(updated_at)` SLA, landing latency |
| **Integrity** | Are relationships preserved? | FK existence, orphan detection |

Each data product contract must map at least one rule per applicable dimension.

---

## 3. Rule Taxonomy

### 3.1 Rule Categories

```
RULE TAXONOMY
├── SCHEMA RULES          → structure, types, required columns
├── CONTENT RULES         → value-level assertions
├── VOLUME RULES          → row count, byte size, partition coverage
├── FRESHNESS RULES       → latency, watermark age
├── RECONCILIATION RULES  → source vs target aggregates
└── ANOMALY RULES         → statistical deviation from baseline
```


// ... download full template for remaining code

How to use this framework

End-to-end operating framework for data quality in modern DE teams — dimensions, rule taxonomy, execution layers, ownership, escalation paths, and KPI scorecards.

  • Download the full document and review with your platform/architecture team
  • Replace organization-specific placeholders (team names, AWS accounts, domains)
  • Map each section to your current-state vs target-state gap analysis
  • Use as an RFC or architecture decision record (ADR) starting point
data qualitygovernancedbtgreat expectationsoperating model
Downloads98
UpdatedJul 2, 2026
Login to share feedback