Lakehouse Batch Ingestion Workflow
Medallion bronze/silver/gold batch ingestion workflow with quality gates, quarantine paths, and operational runbooks for lakehouse platforms.
Data Engineering ArchitecturesIntermediateWorkflow Template
Architecture Diagram
AWS reference layout with grouped regions, numbered flows, and official service icons.
Lakehouse Batch Ingestion on AWSMedallion architecture with quality gates
Main path (orange): extract → validate → bronze → transform → silver → marts → SQL API · Dashed: quarantine, monitoring, catalog
Code preview
65 linesReplace {{PLACEHOLDERS}} with your environment values, then deploy to your stack.
# Lakehouse Batch Ingestion Workflow
> DE Architecture · {{ORGANIZATION_NAME}}
## Overview
Batch ingestion workflow for medallion lakehouse: land raw → validate → curate → publish to consumption zone with quality gates at each transition.
## Workflow Diagram
```
Source Systems
│
▼
┌─────────────┐ quality gate ┌─────────────┐ transform ┌─────────────┐
│ Bronze │ ───────────────▶ │ Silver │ ────────────▶ │ Gold │
│ Raw land │ schema + vol │ Conformed │ aggregates │ Marts/API │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└──────── quarantine ────────────┴─────── observability ───────┘
```
## Bronze Layer Workflow
1. **Extract** - Pull from {{SOURCE_SYSTEM}} on schedule {{CRON}}
2. **Land** - Write Parquet to `s3://{{BUCKET}}/bronze/{{DOMAIN}}/{{ENTITY}}/dt={{DATE}}/`
3. **Register** - Glue/Unity catalog table with partition keys
4. **Validate** - Row count vs source; schema hash comparison
5. **Gate** - Fail pipeline if variance > {{VARIANCE_THRESHOLD}}%
## Silver Layer Workflow
1. Read incremental bronze partitions (bookmark/watermark)
2. Apply schema enforcement and type coercion
3. Deduplicate on natural key `{{PK_COLUMN}}`
4. Apply business rules (valid enums, date ranges)
5. Write to `s3://{{BUCKET}}/silver/{{DOMAIN}}/{{ENTITY}}/`
6. Emit DQ metrics to {{OBSERVABILITY_TOOL}}
## Gold Layer Workflow
1. Aggregate/enrich silver into domain marts
2. Apply row-level security policies if needed
3. Publish to {{WAREHOUSE}} external tables or native tables
4. Notify downstream consumers via {{EVENT_BUS}}
## Operational Runbook
| Event | Action | Owner |
|-------|--------|-------|
| Schema drift | Quarantine partition, alert steward | {{DATA_STEWARD}} |
| Volume anomaly | Hold promotion to gold, investigate | {{DE_ONCALL}} |
| SLA miss | Escalate to {{ESCALATION_CONTACT}} | Platform |
## Tools Mapping
- Orchestration: {{ORCHESTRATOR}} (Airflow/Glue workflows)
- Processing: {{SPARK_ENGINE}} / dbt
- Catalog: {{CATALOG}} (Glue/Unity/Databricks)
- Quality: {{DQ_TOOL}}
## Customization
Plug in your bucket paths, domain names, PK columns, and alerting channels.
How to use this architecture
- Use in architecture review meetings or RFC documents
- Map each component to your cloud accounts, teams, and tools
- Replace {{PLACEHOLDERS}} with environment-specific values
- Extend workflow steps with your org's SLAs and governance gates
lakehousemedallionbatchingestionworkflow
Downloads55
UpdatedJul 2, 2026