Lakehouse Batch Ingestion Workflow

Medallion bronze/silver/gold batch ingestion workflow with quality gates, quarantine paths, and operational runbooks for lakehouse platforms.

Data Engineering ArchitecturesIntermediateWorkflow Template

Architecture Diagram

AWS reference layout with grouped regions, numbered flows, and official service icons.

Lakehouse Batch Ingestion on AWSMedallion architecture with quality gates
External SourcesIngestion PipelineBronze LayerSilver LayerGold LayerGovernance & Observability12SaaS extract3quality gate4567publishfailed recordslineageAmazon RDSOperational DBAmazon AppFlowSaaS sourcesAWS DMSAWS GlueCrawlerRaw landAmazon S3BronzeTransformAWS GlueETL jobConformedAmazon S3SilverMartsAmazon RedshiftSQL APIAmazon AthenaQuarantineAmazon S3Metrics & alarmsAmazon CloudWatchAccess controlAWS Lake FormationData CatalogAWS Glue

Main path (orange): extract → validate → bronze → transform → silver → marts → SQL API · Dashed: quarantine, monitoring, catalog

Code preview

65 lines

Replace {{PLACEHOLDERS}} with your environment values, then deploy to your stack.

# Lakehouse Batch Ingestion Workflow

> DE Architecture · {{ORGANIZATION_NAME}}

## Overview

Batch ingestion workflow for medallion lakehouse: land raw → validate → curate → publish to consumption zone with quality gates at each transition.

## Workflow Diagram

```
 Source Systems
      │
      ▼
┌─────────────┐   quality gate   ┌─────────────┐   transform   ┌─────────────┐
│  Bronze     │ ───────────────▶ │   Silver    │ ────────────▶ │    Gold     │
│  Raw land   │   schema + vol   │  Conformed  │   aggregates  │  Marts/API  │
└─────────────┘                  └─────────────┘               └─────────────┘
      │                                │                              │
      └──────── quarantine ────────────┴─────── observability ───────┘
```

## Bronze Layer Workflow

1. **Extract** - Pull from {{SOURCE_SYSTEM}} on schedule {{CRON}}
2. **Land** - Write Parquet to `s3://{{BUCKET}}/bronze/{{DOMAIN}}/{{ENTITY}}/dt={{DATE}}/`
3. **Register** - Glue/Unity catalog table with partition keys
4. **Validate** - Row count vs source; schema hash comparison
5. **Gate** - Fail pipeline if variance > {{VARIANCE_THRESHOLD}}%

## Silver Layer Workflow

1. Read incremental bronze partitions (bookmark/watermark)
2. Apply schema enforcement and type coercion
3. Deduplicate on natural key `{{PK_COLUMN}}`
4. Apply business rules (valid enums, date ranges)
5. Write to `s3://{{BUCKET}}/silver/{{DOMAIN}}/{{ENTITY}}/`
6. Emit DQ metrics to {{OBSERVABILITY_TOOL}}

## Gold Layer Workflow

1. Aggregate/enrich silver into domain marts
2. Apply row-level security policies if needed
3. Publish to {{WAREHOUSE}} external tables or native tables
4. Notify downstream consumers via {{EVENT_BUS}}

## Operational Runbook

| Event | Action | Owner |
|-------|--------|-------|
| Schema drift | Quarantine partition, alert steward | {{DATA_STEWARD}} |
| Volume anomaly | Hold promotion to gold, investigate | {{DE_ONCALL}} |
| SLA miss | Escalate to {{ESCALATION_CONTACT}} | Platform |

## Tools Mapping

- Orchestration: {{ORCHESTRATOR}} (Airflow/Glue workflows)
- Processing: {{SPARK_ENGINE}} / dbt
- Catalog: {{CATALOG}} (Glue/Unity/Databricks)
- Quality: {{DQ_TOOL}}

## Customization

Plug in your bucket paths, domain names, PK columns, and alerting channels.

How to use this architecture

  • Use in architecture review meetings or RFC documents
  • Map each component to your cloud accounts, teams, and tools
  • Replace {{PLACEHOLDERS}} with environment-specific values
  • Extend workflow steps with your org's SLAs and governance gates
lakehousemedallionbatchingestionworkflow
Downloads55
UpdatedJul 2, 2026
Login to share feedback
Lakehouse Batch Ingestion Workflow | Open Data & AI Engineering Frameworks