Module Overview: Data-Intensive Systems
"Design a system that stores and queries massive datasets."
This module focuses on data modeling, storage choices, and architecture patterns for systems where data volume, query performance, and correctness dominate the design.
Learning Goals
- Choose storage engines based on access patterns
- Apply indexing, partitioning, and caching strategies
- Model pipelines (ingest → process → serve) with clear responsibilities
- Reason about durability, retention, and cost
Interview Preparation
- ✅ Explain OLTP vs OLAP trade-offs
- ✅ Discuss partition keys, hotspots, and compaction
- ✅ Design pipelines and serving layers
Real-World Application
- Analytics and reporting platforms
- Search and recommendation backends
- Event data lakes and warehouses
Estimated Time
45-60 minutes (includes practice)
Checklist
- Can map queries to storage choices
- Can design partitions and indexes
- Can explain retention and backfills