Data Mesh on AWS with Lake Formation

Complete implementation framework for domain-oriented data mesh on AWS — Lake Formation federated governance, cross-account data products, domain ownership, and phased rollout.

Data MeshAdvancedFramework Document

Code preview

357 lines

Replace {{PLACEHOLDERS}} with your environment values, then deploy to your stack.

# Data Mesh on AWS with Lake Formation

**Version:** {{FRAMEWORK_VERSION}}  
**Owner:** {{DATA_PLATFORM_TEAM}}  
**Last Updated:** {{LAST_UPDATED_DATE}}  
**Status:** {{DRAFT | REVIEW | APPROVED}}

---

## Executive Summary

This framework defines how {{ORGANIZATION_NAME}} implements a **Data Mesh** architecture on AWS using **Amazon S3**, **AWS Glue**, and **AWS Lake Formation (LF)**. It establishes domain ownership, data product contracts, federated governance, and cross-account access patterns so teams can publish and consume data products at scale without central bottlenecks.

**Target outcomes:**

- Domains own their data products end-to-end (schema, quality, SLAs, documentation)
- Central platform provides shared infrastructure, standards, and guardrails
- Lake Formation enforces fine-grained access via LF-Tags and resource links
- Cross-account consumption is auditable, repeatable, and least-privilege

---

## 1. Principles and Non-Negotiables

| Principle | Description |
|-----------|-------------|
| Domain ownership | Business domains (not the platform team) own data product lifecycle |
| Data as a product | Every published dataset has an owner, contract, SLA, and discoverability |
| Self-serve infrastructure | Domains provision via templates; platform does not hand-build pipelines |
| Federated computational governance | Global policies + domain-local enforcement via LF-Tags and IAM |
| Interoperability | Standard formats (Parquet/Iceberg), catalog metadata, and naming conventions |

**Non-negotiables for {{ORGANIZATION_NAME}}:**

1. No production data product without a registered contract in {{DATA_CATALOG}} (e.g., AWS Glue Data Catalog, DataZone)
2. All S3 data lake paths must be registered in Lake Formation before grant issuance
3. PII and regulated data require LF-Tag classification before publish
4. Cross-account access only via LF resource links or RAM-shared databases - no bucket policies as primary ACL

---

## 2. Reference Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│                     CENTRAL GOVERNANCE ACCOUNT                          │
│  Lake Formation Admin │ LF-Tag Policy Store │ Audit (CloudTrail/LF)    │
│  Shared Glue Catalog  │ DataZone / Business Glossary                    │
└───────────────────────────────┬─────────────────────────────────────────┘
                                │ LF Grants + Resource Links
        ┌───────────────────────┼───────────────────────┐
        ▼                       ▼                       ▼
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ DOMAIN ACCT A │       │ DOMAIN ACCT B │       │ DOMAIN ACCT C │
│ S3 Data Lake  │       │ S3 Data Lake  │       │ S3 Data Lake  │
│ Glue ETL/Jobs │       │ Glue/Spark    │       │ dbt on EMR/   │
│ Data Products │       │ Data Products │       │ Athena        │
└───────┬───────┘       └───────┬───────┘       └───────┬───────┘
        │                       │                       │
        └───────────────────────┼───────────────────────┘
                                ▼
                    ┌───────────────────────┐
                    │ CONSUMPTION ACCOUNTS  │
                    │ BI / ML / Analytics   │
                    │ Athena / Redshift /   │
                    │ Spark / SageMaker     │
                    └───────────────────────┘
```

### Core AWS Services

| Layer | Service | Role |
|-------|---------|------|
| Storage | Amazon S3 | Domain-scoped buckets/prefixes per data product |
| Catalog | AWS Glue Data Catalog | Tables, schemas, lineage hooks |
| Governance | Lake Formation | Grants, LF-Tags, row/column filters, audit |
| Compute | Glue, EMR, Athena, Redshift Spectrum | Transform and query |
| Orchestration | Step Functions, Airflow (MWAA) | Pipeline scheduling |
| Discovery | {{DATA_CATALOG}} / DataZone | Product registry and contracts |


// ... download full template for remaining code

How to use this framework

Complete implementation framework for domain-oriented data mesh on AWS — Lake Formation federated governance, cross-account data products, domain ownership, and phased rollout.

  • Download the full document and review with your platform/architecture team
  • Replace organization-specific placeholders (team names, AWS accounts, domains)
  • Map each section to your current-state vs target-state gap analysis
  • Use as an RFC or architecture decision record (ADR) starting point
data meshawslake formationfederated governancedomain ownership
Downloads63
UpdatedJul 2, 2026
Login to share feedback