Modern Data Stack Reference Architecture
End-to-end reference architecture for ingestion, dbt transformation, Snowflake warehouse, Airflow orchestration, and BI consumption — with workflow steps and ownership matrix.
Data Engineering ArchitecturesIntermediateWorkflow Template
Architecture Diagram
AWS reference layout with grouped regions, numbered flows, and official service icons.
Modern Data Stack on AWSIngestion → transform → warehouse → BI
Orchestrated by MWAA + Step Functions + EventBridge · Swap Redshift for Snowflake/BigQuery if needed
Code preview
80 linesReplace {{PLACEHOLDERS}} with your environment values, then deploy to your stack.
# Modern Data Stack Reference Architecture
> DE Architecture · Workflow template for {{ORGANIZATION_NAME}}
## Purpose
Reference architecture for a modern analytics stack: ingestion → transformation (dbt) → warehouse (Snowflake/BigQuery) → orchestration (Airflow) → consumption (BI/ML).
## Architecture Diagram
```
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ Sources │───▶│ Ingestion │───▶│ Raw/Bronze │───▶│ dbt Staging │
│ SaaS, DBs │ │ Fivetran/Air │ │ S3/BQ/SF │ │ stg_* │
└─────────────┘ └──────────────┘ └─────────────┘ └──────┬───────┘
│
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ BI / ML │◀───│ Marts │◀───│ dbt Int/Marts│◀──────────┘
│ Looker/Hex │ │ fct_/dim_ │ │ int_*, marts│
└─────────────┘ └──────────────┘ └─────────────┘
▲
┌──────┴───────┐
│ Airflow │
│ Scheduler │
└──────────────┘
```
## Component Responsibilities
| Layer | Tool | Owner | SLA |
|-------|------|-------|-----|
| Ingestion | {{INGESTION_TOOL}} | {{INGESTION_TEAM}} | {{INGESTION_SLA}} |
| Storage | {{WAREHOUSE}} | {{PLATFORM_TEAM}} | 99.9% |
| Transform | dbt | {{ANALYTICS_ENG_TEAM}} | {{DBT_SLA}} |
| Orchestrate | Airflow | {{DE_TEAM}} | {{ORCHESTRATION_SLA}} |
| Consume | {{BI_TOOL}} | {{ANALYTICS_TEAM}} | Business hours |
## End-to-End Workflow
### Step 1 - Source onboarding
1. Register source in catalog with owner and classification
2. Configure connector with least-privilege credentials
3. Land raw data in `{{RAW_DATABASE}}.{{RAW_SCHEMA}}`
4. Validate row counts and schema against contract
### Step 2 - Staging & modeling
1. Create `stg_{{ENTITY}}` with source-aligned cleaning
2. Build `int_{{ENTITY}}` for joins and business logic
3. Publish `fct_{{ENTITY}}` / `dim_{{ENTITY}}` marts
4. Run dbt tests (unique, not_null, relationships, custom DQ)
### Step 3 - Orchestration
1. Airflow DAG triggers after ingestion completion sensor
2. dbt run + test in CI/CD or Airflow BashOperator
3. On failure: page {{ONCALL_CHANNEL}}, block downstream marts
### Step 4 - Consumption & feedback
1. Expose marts to BI semantic layer / metrics store
2. Track query adoption and dataset freshness SLAs
3. Quarterly review unused models for deprecation
## Non-Functional Requirements
- **Security:** RBAC on warehouse; PII masked in marts
- **Cost:** Warehouse auto-suspend; incremental models default
- **Governance:** All models documented in dbt + catalog
- **Observability:** Freshness checks on top 20 critical marts
## Implementation Checklist
- [ ] Define naming conventions (stg_, int_, fct_, dim_)
- [ ] Set up dev/staging/prod environments
- [ ] Configure dbt Cloud or CI pipeline
- [ ] Wire Airflow sensors to ingestion completion
- [ ] Establish on-call runbook for pipeline failures
## {{ORGANIZATION_NAME}} Customization Notes
Replace tool names, team owners, and SLAs above. Add domain-specific marts and lineage diagrams as appendices.
How to use this architecture
- Use in architecture review meetings or RFC documents
- Map each component to your cloud accounts, teams, and tools
- Replace {{PLACEHOLDERS}} with environment-specific values
- Extend workflow steps with your org's SLAs and governance gates
modern data stackdbtairflowsnowflakeworkflow
Downloads92
UpdatedJul 2, 2026