Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Module Overview: Data-Intensive Systems

"Design a system that stores and queries massive datasets."

This module focuses on data modeling, storage choices, and architecture patterns for systems where data volume, query performance, and correctness dominate the design.

Learning Goals

  • Choose storage engines based on access patterns
  • Apply indexing, partitioning, and caching strategies
  • Model pipelines (ingest → process → serve) with clear responsibilities
  • Reason about durability, retention, and cost

Interview Preparation

  • ✅ Explain OLTP vs OLAP trade-offs
  • ✅ Discuss partition keys, hotspots, and compaction
  • ✅ Design pipelines and serving layers

Real-World Application

  • Analytics and reporting platforms
  • Search and recommendation backends
  • Event data lakes and warehouses

Estimated Time

45-60 minutes (includes practice)

Checklist

  • Can map queries to storage choices
  • Can design partitions and indexes
  • Can explain retention and backfills