Data Governance for Practitioners
Hands-on governance for practitioners: ownership, classification, access, and catalog practices.
Governance That Ships
Lightweight Governance for Practitioners
Governance often conjures images of committees, multi-month approvals, and blocked projects. Effective practitioner governance is different: minimal process that prevents real risks while keeping teams moving. The goal is confidence in data, not compliance theater.
Named Ownership
Every Tier 1 dataset needs a named owner—team or individual accountable for quality, documentation, and access decisions. Ownership appears in the catalog, not buried in a wiki.
Owners respond to incidents, approve access requests within SLA, and review breaking change notifications.
Without owners, datasets orphan and quality decays.
Data Contracts Between Producers and Consumers
A data contract specifies schema, grain, SLAs, and breaking change policy between upstream producers and downstream consumers.
Example contract clause: "Adding nullable columns is non-breaking; renaming columns requires 30-day deprecation notice and consumer acknowledgment."
Contracts can live as YAML in repo alongside dbt sources—enforceable in CI.
PR Checks for Breaking Changes
Automate detection of breaking schema changes in pull requests: dropped columns, type narrowing, grain changes. Block merge or require explicit approval from listed consumers.
Tools: dbt state comparison, schema diff actions, catalog integration.
Governance That Accelerates
Good governance answers questions quickly: "Can I use this dataset?" "Who approves access?" "Is this column PII?" Teams ship faster when answers are self-serve in catalog rather than tribal knowledge.
Key Takeaways
- Assign named owners to critical datasets with visible accountability.
- Use data contracts to formalize producer-consumer expectations.
- Automate breaking change detection in CI.
- Design governance for self-serve speed, not committee delay.
Reflection
What governance rule actually speeds teams up instead of blocking them? What rule exists only on paper?
Classification and Access Control
Protecting Data Without Blocking Work
Classification tags data by sensitivity (public, internal, confidential, PII, regulated). Access control enforces who can read or write based on role, need, and classification. Together they reduce breach risk while enabling legitimate analytics.
Classification Taxonomy
Keep taxonomy small and actionable:
| Level | Examples | Default access |
|---|---|---|
| Public | Marketing aggregates | Broad read |
| Internal | Operational metrics | Employees |
| Confidential | Revenue detail | Finance + leadership |
| PII | Email, phone, SSN | Masked or role-restricted |
| Regulated | HIPAA, PCI | Legal/compliance approval |
Tag at column level where sensitivity varies within tables.
Role-Based and Attribute-Based Access
RBAC assigns permissions to roles (analyst, finance_analyst, data_engineer). ABAC adds conditions (department, region, clearance).
Implement in warehouse (Snowflake RBAC, BigQuery IAM), lake (Lake Formation LF-Tags), and BI tools consistently.
Avoid duplicating conflicting policies across layers—document source of truth.
Least Privilege and Access Reviews
Grant minimum access needed. Quarterly access reviews for PII and regulated data: managers confirm direct reports still require access.
Automate provisioning via request workflows with approval routing to data owners.
Masking and Tokenization
For PII columns, apply dynamic masking (partial email, hashed identifier) for roles without full access. Tokenization replaces sensitive values with reversible tokens for approved use cases.
Analysts explore patterns without seeing raw PII.
Key Takeaways
- Use a simple classification taxonomy applied at column level.
- Enforce consistent RBAC/ABAC across warehouse, lake, and BI.
- Conduct periodic access reviews for sensitive data.
- Apply masking so analysts work safely with PII-adjacent datasets.
Reflection
Can you identify which columns in your most-used mart are PII? Who has access today—and is that access reviewed?
Catalog and Metadata Practices
Making Governance Discoverable
A catalog without adoption is empty metadata. Practitioner governance embeds catalog usage into daily workflows so documentation and lineage stay current.
Minimum Viable Metadata
For every production dataset, require:
- Description and grain
- Owner and slack channel
- Classification tags
- Freshness SLA
- Link to dbt model or pipeline repo
Optional but valuable: sample queries, related dashboards, known limitations.
Lineage and Impact Analysis
Automated lineage from dbt, Spark, or ingestion tools shows upstream sources and downstream dashboards. Use lineage before schema changes to notify affected consumers proactively.
Impact analysis reduces surprise breakages and builds trust in platform changes.
Search and Discovery UX
If analysts cannot find datasets in under two minutes, they revert to asking engineers. Improve search with consistent naming, business glossary terms synced to catalog, and curated "golden" dataset collections.
Data stewards curate domain collections; they do not write every description personally.
Governance Metrics
Track catalog health:
- Percentage of Tier 1 datasets with complete metadata
- Average age of stale descriptions (unchanged since source schema changed)
- Access request turnaround time
- Percentage of datasets with assigned owner
Review monthly; escalate gaps to domain leads.
Integrating Catalog with CI
Block dbt merges if new marts lack descriptions. Sync dbt schema.yml docs to catalog automatically on deploy.
Single source of truth prevents doc drift between repo and catalog.
Key Takeaways
- Require minimum metadata for production datasets: owner, grain, classification, SLA.
- Use lineage for proactive consumer notification on changes.
- Measure catalog completeness and fix gaps systematically.
- Sync documentation from dbt CI to catalog as single source of truth.
Reflection
Is your catalog a living system integrated with CI, or a stale snapshot from a one-time crawl? What would increase analyst trust in catalog search?