Data Mesh on AWS with Lake Formation
Complete implementation framework for domain-oriented data mesh on AWS — Lake Formation federated governance, cross-account data products, domain ownership, and phased rollout.
Data MeshAdvancedFramework Document
Code preview
357 linesReplace {{PLACEHOLDERS}} with your environment values, then deploy to your stack.
# Data Mesh on AWS with Lake Formation
**Version:** {{FRAMEWORK_VERSION}}
**Owner:** {{DATA_PLATFORM_TEAM}}
**Last Updated:** {{LAST_UPDATED_DATE}}
**Status:** {{DRAFT | REVIEW | APPROVED}}
---
## Executive Summary
This framework defines how {{ORGANIZATION_NAME}} implements a **Data Mesh** architecture on AWS using **Amazon S3**, **AWS Glue**, and **AWS Lake Formation (LF)**. It establishes domain ownership, data product contracts, federated governance, and cross-account access patterns so teams can publish and consume data products at scale without central bottlenecks.
**Target outcomes:**
- Domains own their data products end-to-end (schema, quality, SLAs, documentation)
- Central platform provides shared infrastructure, standards, and guardrails
- Lake Formation enforces fine-grained access via LF-Tags and resource links
- Cross-account consumption is auditable, repeatable, and least-privilege
---
## 1. Principles and Non-Negotiables
| Principle | Description |
|-----------|-------------|
| Domain ownership | Business domains (not the platform team) own data product lifecycle |
| Data as a product | Every published dataset has an owner, contract, SLA, and discoverability |
| Self-serve infrastructure | Domains provision via templates; platform does not hand-build pipelines |
| Federated computational governance | Global policies + domain-local enforcement via LF-Tags and IAM |
| Interoperability | Standard formats (Parquet/Iceberg), catalog metadata, and naming conventions |
**Non-negotiables for {{ORGANIZATION_NAME}}:**
1. No production data product without a registered contract in {{DATA_CATALOG}} (e.g., AWS Glue Data Catalog, DataZone)
2. All S3 data lake paths must be registered in Lake Formation before grant issuance
3. PII and regulated data require LF-Tag classification before publish
4. Cross-account access only via LF resource links or RAM-shared databases - no bucket policies as primary ACL
---
## 2. Reference Architecture
```
┌─────────────────────────────────────────────────────────────────────────┐
│ CENTRAL GOVERNANCE ACCOUNT │
│ Lake Formation Admin │ LF-Tag Policy Store │ Audit (CloudTrail/LF) │
│ Shared Glue Catalog │ DataZone / Business Glossary │
└───────────────────────────────┬─────────────────────────────────────────┘
│ LF Grants + Resource Links
┌───────────────────────┼───────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ DOMAIN ACCT A │ │ DOMAIN ACCT B │ │ DOMAIN ACCT C │
│ S3 Data Lake │ │ S3 Data Lake │ │ S3 Data Lake │
│ Glue ETL/Jobs │ │ Glue/Spark │ │ dbt on EMR/ │
│ Data Products │ │ Data Products │ │ Athena │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└───────────────────────┼───────────────────────┘
▼
┌───────────────────────┐
│ CONSUMPTION ACCOUNTS │
│ BI / ML / Analytics │
│ Athena / Redshift / │
│ Spark / SageMaker │
└───────────────────────┘
```
### Core AWS Services
| Layer | Service | Role |
|-------|---------|------|
| Storage | Amazon S3 | Domain-scoped buckets/prefixes per data product |
| Catalog | AWS Glue Data Catalog | Tables, schemas, lineage hooks |
| Governance | Lake Formation | Grants, LF-Tags, row/column filters, audit |
| Compute | Glue, EMR, Athena, Redshift Spectrum | Transform and query |
| Orchestration | Step Functions, Airflow (MWAA) | Pipeline scheduling |
| Discovery | {{DATA_CATALOG}} / DataZone | Product registry and contracts |
// ... download full template for remaining codeHow to use this framework
Complete implementation framework for domain-oriented data mesh on AWS — Lake Formation federated governance, cross-account data products, domain ownership, and phased rollout.
- Download the full document and review with your platform/architecture team
- Replace organization-specific placeholders (team names, AWS accounts, domains)
- Map each section to your current-state vs target-state gap analysis
- Use as an RFC or architecture decision record (ADR) starting point
data meshawslake formationfederated governancedomain ownership
Downloads63
UpdatedJul 2, 2026