Lesson 3: Caching

Why Cache? The Performance Multiplier

Caching is the process of storing copies of data in a temporary storage location (cache) so that future requests for that data can be served faster. It's one of the most effective performance optimizations in system design.

Without caching:

User Request → Database (100-500ms latency)
               ↓
          Disk I/O, network round-trip

With caching:

User Request → Cache Hit (1-5ms latency) ✓
               ↓
          Return data immediately (no DB query)

Cache miss scenario:

User Request → Cache Miss
               ↓
          Database Query (100-500ms)
               ↓
          Store in Cache
               ↓
          Return data

The Performance Impact

System	Without Cache	With Cache	Improvement
User Profile	200ms (DB)	2ms (Redis)	100x faster
Product Catalog	150ms (DB)	5ms (Redis)	30x faster
API Response	500ms (DB + joins)	10ms (Redis)	50x faster
Session Data	50ms (DB)	1ms (in-memory)	50x faster

Caching Strategies

Cache-Aside (Lazy Loading)

How it works:

App checks cache
If miss, App reads from DB
App writes to cache
Next request hits cache

┌─────────────┐
│   Application │
└──────┬──────┘
       │
       ├───────── Check Cache
       │
       ├─ Hit? → Return (1-5ms)
       │
       └─ Miss? → Query DB (100-500ms)
                    ↓
                 Write to Cache
                    ↓
                 Return

Pros:

✅ Only requested data is cached (efficient cache usage)
✅ Simple to implement
✅ Works with any database
✅ Cache failure doesn't break app (fallback to DB)

Cons:

❌ Initial request is slow (cache miss penalty)
❌ Can have cache stampede (thundering herd) when multiple requests miss simultaneously
❌ Stale data until cache expires or is invalidated

Real-world use: Most applications, product catalogs, user profiles, API responses

Write-Through

How it works:

App writes to cache and DB simultaneously
Reads always check cache first

┌─────────────┐
│   Application │
└──────┬──────┘
       │
       ├───── Write Request
       │
       ├─────────────────┬
       │                 │
       ▼                 ▼
   Write Cache       Write DB
   (synchronous)    (synchronous)
       │                 │
       └────────┬────────┘
                ▼
          Return (wait for both)

Pros:

✅ Data in cache is always fresh
✅ Strong consistency between cache and DB
✅ Simple read logic (always check cache)

Cons:

❌ Slower writes (synchronous DB write)
❌ Writes to data that's never read (wasted cache space)
❌ Higher latency for write operations

Real-world use: User sessions, configuration data that must be consistent

Write-Back (Write-Behind)

How it works:

App writes to cache only
Cache writes to DB asynchronously
Reads check cache, fallback to DB if miss

┌─────────────┐
│   Application │
└──────┬──────┘
       │
       ├───── Write Request
       │
       ▼
   Write Cache (immediate)
       │
       ▼
   Return (immediate)
       
       └── Asynchronously → Write DB (background)

Pros:

✅ Extremely fast writes
✅ Can batch writes to DB (reduced DB load)
✅ Better write throughput

Cons:

❌ Data loss risk if cache crashes before syncing
❌ Complexity in handling failures
❌ Eventual consistency (DB might be behind)
❌ Data durability concerns

Real-world use: Write-heavy systems, analytics events, clickstream data

Refresh-Ahead

How it works:

Background job refreshes cache before expiration
Ensures hot data is always in cache
Users never see cache misses for popular data

┌─────────────┐
│   Application │
└──────┬──────┘
       │
       ├───── Read Request
       │
       ├─ Hit? → Return
       │
       └─ Miss? → Query DB → Return
                    ↓
            Trigger background refresh

┌─────────────────┐
│ Background Job   │
└─────────┬───────┘
          │
          ├─ Find expiring cache entries
          ├─ Refresh from DB
          └─ Update cache (before users request)

Pros:

✅ Users never see cache misses for hot data
✅ Predictive caching for known hot keys
✅ Better user experience

Cons:

❌ Complexity in predicting what to refresh
❌ Wasted resources if predictions are wrong
❌ Background processing adds complexity

Real-world use: Product catalogs, leaderboards, trending content

Eviction Policies

When the cache is full, what do you remove? This decision significantly impacts cache hit rates.

LRU (Least Recently Used)

How it works: Remove the item that hasn't been used for the longest time.

Example:

Cache [10 items, capacity 1000 items]
├── User A profile (last accessed: 1 hour ago) ← Evict this
├── Product catalog entry (last accessed: 5 minutes ago)
├── Session data (last accessed: 2 minutes ago)
└── ...

When cache is full, remove User A profile (not accessed recently)

Pros:

✅ Simple to implement
✅ Works well for temporal locality (recently accessed likely to be accessed again)
✅ Good for general-purpose caching

Cons:

❌ Doesn't account for access frequency (popular items might be evicted)
❌ Can be suboptimal for certain access patterns
❌ Requires tracking access time (some overhead)

Real-world use: Most default cache implementations, web caches, database caches

LFU (Least Frequently Used)

How it works: Remove the item that has been accessed least often overall.

Example:

Cache [10 items, capacity 1000 items]
├── User A profile (access count: 5) ← Evict this
├── Product catalog entry (access count: 10,000)
├── Session data (access count: 50)
└── ...

When cache is full, remove User A profile (least accessed)

Pros:

✅ Keeps most popular items in cache
✅ Good for workloads with clear popularity patterns
✅ Optimizes for cache hit rate

Cons:

❌ Requires tracking access frequency (more memory overhead)
❌ Can lock in old popular items (cold data problem)
❌ Doesn't adapt to changing access patterns quickly

Real-world use: Content distribution networks, recommendation systems, API response caches

FIFO (First In, First Out)

How it works: Remove the oldest item regardless of access patterns.

Example:

Cache [10 items, capacity 1000 items]
├── Item 1 (added 1 hour ago) ← Evict this
├── Item 2 (added 30 minutes ago)
├── Item 3 (added 15 minutes ago)
└── ...

When cache is full, remove Item 1 (oldest)

Pros:

✅ Simplest to implement
✅ No tracking needed (just insertion order)
✅ Deterministic behavior

Cons:

❌ No consideration of access patterns
❌ Can evict hot items if they were cached early
❌ Poor cache hit rate in most scenarios

Real-world use: Round-robin load balancer caches, simple message queues

TTL (Time To Live)

How it works: Remove items after a fixed time period, regardless of usage.

Example:

Cache entries with TTL:
├── User session (TTL: 30 minutes) ← Auto-remove after 30min
├── Product price (TTL: 5 minutes) ← Auto-remove after 5min
├── Configuration (TTL: 1 hour) ← Auto-remove after 1hour
└── News feed (TTL: 1 minute) ← Auto-remove after 1min

Pros:

✅ Guarantees data freshness
✅ Simple to understand and reason about
✅ Works great for time-sensitive data
✅ Automatic cleanup (no manual eviction needed)

Cons:

❌ Popular items might expire and be re-fetched
❌ Need to choose appropriate TTL (tuning required)
❌ Can cause cache stampede if popular items expire simultaneously

Real-world use: Session data, real-time prices, news feeds, API rate limiting

Random Replacement

How it works: Randomly select an item to evict.

Pros:

✅ Simple to implement
✅ No tracking overhead
✅ Works surprisingly well in practice

Cons:

❌ Can evict hot items
❌ Suboptimal cache hit rate
❌ Unpredictable behavior

Real-world use: Simple caching implementations where tracking overhead is a concern

Real-World Case Studies

Case Study 1: Redis at Scale

The Challenge: Redis is used by companies like Twitter, Instagram, and Uber for ultra-low latency caching. How does Redis handle millions of requests per second with sub-millisecond latency?

The Architecture:

Clients → Redis Cluster (100+ nodes)
         ├── Node 1-20: US-East
         ├── Node 21-40: US-West
         ├── Node 41-60: EU-Central
         └── Node 61-80: AP-Southeast
         
Sharding Strategy: Consistent hashing of keys
Replication: Master-slave for each shard
Persistence: AOF (Append Only File) + RDB snapshots

Key Optimizations:

In-memory data structures - No disk I/O for reads
Single-threaded - No locking overhead, atomic operations
Efficient data structures - Hash tables, skip lists, etc.
Pipelining - Send multiple commands in one network round-trip
Lua scripting - Execute multiple operations atomically on server

Performance Results:

Throughput: 10M+ operations/second per cluster
Latency: <1ms p50, <5ms p99
Memory usage: 100GB+ RAM across cluster
Hit rate: 95-98% for hot keys
Capacity: 1B+ keys stored

Use Cases:

Session storage (10M+ active sessions)
Rate limiting (100K+ requests/second)
Real-time leaderboards
Pub/Sub messaging (100K+ messages/second)

💡 Key Insight: Redis achieves incredible performance by keeping everything in memory and using single-threaded architecture. No locks means no contention. The trade-off is memory-intensive and limited by RAM capacity.

Case Study 2: Facebook's Caching Strategy

The Challenge: Facebook (now Meta) serves billions of users with complex social graphs, news feeds, and real-time interactions. Caching is critical for performance at this scale.

The Caching Architecture:

Facebook Platform Caching Layers:
├── Edge Caching (Akamai CDN)
│   ├── Static assets (images, CSS, JS)
│   └── HTML content (cache time: 1-5 minutes)
│
├── Application Caching (Tao + Memcached)
│   ├── User profiles
│   ├── Friend lists
│   ├── News feed entries
│   └── Permissions
│
├── Database Caching (MySQL + caching layer)
│   ├── Hot database rows
│   ├── Query results
│   └── Join results
│
└── Client Caching (Browser)
    ├── API responses
    ├── Static resources
    └── Service Worker cache

Tao (The Associations and Objects):

Facebook's distributed edge caching system
100+ terabytes of cache across multiple datacenters
Cache hit rate: 98-99% for read-heavy workloads
Consistent hashing for key distribution
Hot items replicated across multiple caches

Memcached Configuration:

Cluster: 10,000+ servers
Total capacity: 10+ TB of RAM
Items cached: 1+ trillion objects
Hit rate: 98% (overall)
Latency: <5ms p95

News Feed Caching Strategy:

User A's News Feed Generation:
├── Check cache for user's feed (TTL: 5 minutes)
├── If miss, generate from:
│   ├── Friend list (cached, TTL: 1 hour)
│   ├── Friend's posts (cached, TTL: 10 minutes)
│   ├── Friend's likes (cached, TTL: 5 minutes)
│   └── Ranking algorithm (real-time)
├── Assemble feed
└── Cache result (TTL: 5 minutes)

Performance Results:

News feed generation: From 500ms (DB) to 10ms (cache)
User profile loads: From 200ms (DB) to 2ms (cache)
Social graph queries: From 100ms (DB) to 5ms (cache)
Overall hit rate: 98% across all cache layers

💡 Key Insight: Facebook uses multiple cache layers (edge, application, database, client) with different TTLs for each layer. This multi-layer approach gives them both performance (edge cache) and freshness (shorter TTLs at application layer). The complexity is high, but the performance improvement is massive.

Case Study 3: CDN Caching (CloudFront, Cloudflare, Akamai)

The Challenge: Serve static assets and content to users globally with low latency. How do CDNs achieve sub-100ms latency worldwide?

The Architecture:

User Request → Edge Server (closest geographically)
             ↓ (cache hit: <10ms)
          Return cached content
             
User Request → Edge Server (cache miss)
             ↓ (fetch from origin)
          Origin Server
             ↓ (return)
          Edge Server (cache for future)
             ↓
          Return to user

CloudFront Caching:

AWS Global Network:
├── 600+ edge locations worldwide
├── 500+ points of presence
├── 13 regional edge caches
└── 200+ PoPs (Points of Presence)

Cache Behaviors:
├── Static assets (images, CSS, JS): 1 year TTL
├── API responses: 5-60 minute TTL
├── HTML content: 1-5 minute TTL
└── Dynamic content: no caching (bypass)

Cache-Control Headers:

Cache-Control: max-age=3600, public
  → Cache for 1 hour, CDNs and browsers can cache

Cache-Control: max-age=0, no-cache, no-store, must-revalidate
  → No caching, always fetch from origin

Cache-Control: max-age=86400, s-maxage=3600, public
  → Browser cache for 1 day, CDN cache for 1 hour

Performance Results:

Scenario	Without CDN	With CDN	Improvement
Static asset from US to Asia	500ms	50ms	10x faster
API response from Europe to US	300ms	30ms	10x faster
Video streaming (HD)	2000ms	200ms	10x faster
Edge locations covered	0 (origin only)	600+ locations	Global

Cache Invalidation:

Time-based: Automatic expiration based on TTL
Manual: Invalidate specific paths or objects
Purge: Remove from all edge locations (takes 5-30 minutes)

💡 Key Insight: CDNs use geographic distribution and massive scale (thousands of edge servers) to achieve low latency globally. The cache is distributed across edge locations, with each location caching content for nearby users. This gives sub-100ms latency worldwide while offloading origin servers.

Production Metrics

Cache Performance Comparison

Cache System	Throughput	Latency (p95)	Hit Rate	Capacity
Redis	10M+ ops/sec	<5ms	95-98%	1GB-1TB
Memcached	100M+ ops/sec	<2ms	90-95%	10GB-100TB
Varnish	10K+ req/sec	<10ms	80-90%	10GB-100GB
CDN (CloudFront)	1M+ req/sec	<50ms	70-80%	Unlimited

Eviction Policy Performance

Policy	Hit Rate	Complexity	Memory Overhead	Best For
LRU	70-80%	Low	Low	General purpose
LFU	75-85%	Medium	Medium	Popularity-based
TTL	60-90%	Low	Very Low	Time-sensitive data
Random	65-75%	Very Low	None	Simple systems

Trade-Off Scenarios

Scenario 1: E-Commerce Product Catalog

Context: Building an e-commerce platform with 1M+ products. 90% of traffic is browsing products (read-heavy), 10% is purchasing (write-heavy). Need to handle Black Friday spikes (10x normal traffic).

The Trade-Off Decisions:

Decision	Option A	Option B	What You Choose & Why
Cache Strategy	Write-Through	Cache-Aside	Cache-Aside - Only cache hot products, not every product
Eviction Policy	LRU	TTL	LRU - Popular products stay in cache, cold products evicted
TTL for Prices	1 hour	5 minutes	5 minutes - Prices change frequently, must stay fresh
TTL for Details	1 day	1 hour	1 hour - Product details change less often than prices
Cache Invalidation	Time-based (TTL)	Manual purge	TTL + Manual purge - Auto-expire for simplicity, manual purge for price changes
Cache Layer	Single Redis cluster	Multi-tier (Redis + CDN)	Redis + CDN - CDN for product images, Redis for API data

Result:

Pros: High cache hit rate (90%+), fresh pricing data, handles Black Friday traffic
Cons: Manual cache invalidation adds complexity, dual cache layer adds operational overhead
Performance: 90% cache hit rate, <10ms API response for cached products, handles 10x traffic spikes

Scenario 2: Real-Time Leaderboard

Context: Building a real-time game with global leaderboards. 10M+ players, millions of score updates per second. Rankings must be updated in real-time (<100ms latency).

The Trade-Off Decisions:

Decision	Option A	Option B	What You Choose & Why
Cache Strategy	Cache-Aside	Write-Through	Write-Through - Leaderboard must always show latest scores
Data Structure	Sorted set (Redis ZSET)	Hash map + manual sorting	Sorted set - Built-in sorted structure, O(log N) updates
Cache Size	Top 1000 players	Top 1M players	Top 1000 - Cache only hot players, DB for full list
TTL	1 minute	10 seconds	10 seconds - Scores update frequently, must stay fresh
Refresh Strategy	Client polls	Server push (WebSocket)	Server push - Real-time updates, no polling overhead
Backup Strategy	Snapshot every hour	AOF + RDB	AOF + RDB - AOF for durability, RDB for fast recovery

Result:

Pros: Real-time updates, efficient sorted queries, high write throughput
Cons: Memory-intensive (sorted sets), complex backup strategy
Performance: <50ms latency for top 1000, 1M+ score updates/second

Scenario 3: User Session Storage

Context: Building a web application with 1M+ active users. Sessions need to be fast (<5ms) and secure. Sessions should expire after 30 minutes of inactivity.

The Trade-Off Decisions:

Decision	Option A	Option B	What You Choose & Why
Cache Strategy	Cache-Aside	Write-Back	Cache-Aside - Simpler, strong consistency needed
Storage	Database (SQL)	Redis	Redis - Sub-millisecond latency, automatic TTL
TTL	30 minutes	1 hour	30 minutes - Security requirement (sessions expire)
Persistence	None	AOF snapshots	AOF - Prevent data loss if Redis crashes
Session Data	Minimal (user ID)	Full (user profile, cart)	Minimal in cache - Faster access, smaller cache, profile in DB
Geo-Distribution	Single datacenter	Multi-region	Multi-region - Global users, reduce latency
Backup Strategy	None	Replica set	Replica set - Prevent session loss on failure

Result:

Pros: Fast session access, automatic expiration, geo-distributed for low latency
Cons: Redis is memory-intensive, complexity in multi-region setup
Performance: <2ms session lookup, automatic TTL cleanup, 99.99% availability

Sruja Perspective: Modeling Caches

In Sruja, we document caching strategies with clear trade-offs and performance characteristics.

Why Model Caches?

Modeling caches in your architecture provides:

Performance visibility - Track cache hit rates, latency improvements
Strategy clarity - Document which caching strategy and why
Failure analysis - Understand impact of cache failures
Capacity planning - See cache size requirements and scaling needs

Example: E-Commerce Caching Architecture

// partial
import { * } from 'sruja.ai/stdlib'

ECommerce = system "E-Commerce Platform" {
    description "Multi-tier caching strategy for e-commerce platform"
    
    // EDGE CACHE: CDN for static assets
    EdgeCache = container "CDN Cache" {
        technology "CloudFront / Cloudflare"
        description "Caches product images, CSS, JS globally"
        tags ["cache", "edge", "cdn"]
        
        tradeoff {
            decision "Use CDN for static assets"
            sacrifice "Cost (CDN services cost money)"
            reason "Reduces latency 10x globally, offloads origin servers"
            mitigation "Use cache-control headers, optimize asset sizes"
        }
        
        capacity {
            ttl_images "1 year"
            ttl_api "5-60 minutes"
            ttl_html "1-5 minutes"
        }
    }
    
    // APPLICATION CACHE: Redis for API data
    ProductCache = container "Product Cache" {
        technology "Redis Cluster"
        description "Caches product details, prices, inventory"
        tags ["cache", "application", "redis"]
        
        tradeoff {
            decision "Use Cache-Aside strategy"
            sacrifice "Initial request latency (cache miss)"
            reason "Only cache hot products, efficient memory usage"
            mitigation "Refresh-ahead for popular products, monitor hit rate"
        }
        
        tradeoff {
            decision "Use LRU eviction policy"
            sacrifice "Memory for cold products (frequent evictions)"
            reason "Popular products stay in cache, good temporal locality"
            mitigation "TTL for prices (5min), details (1hour)"
        }
        
        slo {
            latency {
                p95 "10ms"
                p99 "50ms"
                window "7 days"
            }
            hit_rate {
                target "90%"
                window "24 hours"
            }
        }
        
        capacity {
            memory "100GB"
            throughput "10M ops/sec"
            hit_rate "90%"
        }
    }
    
    // DATABASE: PostgreSQL for persistent storage
    ProductDB = database "Product Database" {
        technology "PostgreSQL"
        description "Stores all product data persistently"
        tags ["database", "sql"]
        
        tradeoff {
            decision "Use PostgreSQL for persistent storage"
            sacrifice "Write speed (disk I/O slower than memory)"
            reason "Strong consistency, ACID transactions, complex queries"
            mitigation "Read replicas, caching layer"
        }
    }
    
    // SERVICES
    ProductService = container "Product Service" {
        technology "Go"
        description "API for product catalog and search"
    }
    
    // TRAFFIC FLOW
    Client -> ProductService "Requests product"
    ProductService -> ProductCache "Check cache (Cache-Aside, LRU, TTL)"
    ProductService -> ProductDB "Query on cache miss"
    ProductService -> EdgeCache "Serve images/CSS/JS (CDN)"
}

view index {
    title "E-Commerce Caching Architecture"
    include *
}

view caching {
    title "Caching Strategy"
    include ECommerce.ProductCache
}

Key Trade-Offs Documented

1. Cache Strategy Choice:

Why Cache-Aside? Only cache hot data (efficient memory usage)
Why LRU eviction? Popular products stay in cache (good temporal locality)
Why different TTLs? Prices change frequently (5min), details don't (1hour)

2. CDN Integration:

Use CDN for static assets (images, CSS, JS)
Global edge locations for sub-100ms latency
Cache-control headers for different TTLs

3. Consistency vs Performance:

Sacrifice some write speed for cache consistency
TTL ensures freshness even with Cache-Aside
Manual cache invalidation for critical updates

4. Capacity Planning:

100GB Redis cache for 1M products
90% hit rate target
10M+ ops/sec throughput needed for Black Friday

Knowledge Check

Q: When should I use Cache-Aside vs Write-Through?

Cache-Aside when you only want to cache frequently accessed data (hot data). This is more memory-efficient because you don't cache everything. Initial request is slower (cache miss), but subsequent requests are fast.

Write-Through when you need strong consistency between cache and database. Every write updates both cache and DB synchronously, ensuring cache is always fresh. This is slower for writes but simpler for consistency-critical data like user sessions.

Q: My cache is filling up too fast. Which eviction policy should I use?

LRU (Least Recently Used) is a good default choice. It removes items that haven't been accessed recently, which works well for most workloads (temporal locality - recently accessed items are likely to be accessed again).

If you know your workload has clear popularity patterns (some items accessed 100x more than others), use LFU (Least Frequently Used). This keeps the most popular items in cache, improving hit rate for hot data.

Q: How do I prevent cache stampede (thundering herd)?

Cache stampede occurs when many requests miss for the same hot key simultaneously, all querying the database and overwhelming it.

Solutions:

Locking - First request acquires lock, others wait and use cached result
Refresh-ahead - Background job refreshes hot keys before expiration
Probabilistic early expiration - Randomly expire some items early to spread load
Request coalescing - Merge simultaneous requests for the same key

The best solution depends on your workload. Locking is simplest, refresh-ahead gives best user experience.

Quiz: Test Your Knowledge

Q1: Which caching strategy only caches data that's actually requested?

Write-Through
Write-Back
Cache-Aside
Refresh-Ahead

Answer

Cache-Aside only caches data that's actually requested. When a request comes in:

Check cache - if hit, return data
If miss, query database
Store in cache for next request

This is memory-efficient because you don't waste cache space on data nobody requests. Write-Through caches everything on write, regardless of whether it will be read.

Q2: Which eviction policy removes the item that hasn't been accessed for the longest time?

LFU (Least Frequently Used)
FIFO (First In, First Out)
LRU (Least Recently Used)
TTL (Time To Live)

Answer

LRU (Least Recently Used) removes the item that hasn't been accessed for the longest time. This works well for most workloads because of temporal locality - items that were accessed recently are likely to be accessed again.

FIFO removes the oldest item regardless of access pattern. LFU removes the least frequently accessed item. TTL removes items after a fixed time.

Q3: You're building a real-time leaderboard with frequent score updates. Which caching strategy should you use?

Cache-Aside
Write-Through
Write-Back
No caching needed

Answer

Write-Through ensures the cache always has the latest scores, which is critical for a real-time leaderboard. Every score update writes to both cache and database synchronously.

Cache-Aside would be problematic because score updates might not be immediately reflected in the cache. Write-Back is dangerous because if the cache crashes before syncing to database, you could lose score updates.

Q4: What's the main benefit of using a CDN (Content Delivery Network) for caching?

Reduces database load
Improves global latency by serving content from edge locations
Improves cache hit rate
Simplifies cache invalidation

Answer

Improves global latency by serving content from edge locations. CDNs have thousands of servers distributed globally. When a user requests content, it's served from the closest edge server, which could be just a few milliseconds away vs. hundreds of milliseconds from the origin server.

While CDNs also reduce origin server load, their primary benefit is geographic distribution for low latency. Cache hit rate depends on your caching strategy and TTL configuration, not the CDN itself.

Q5: Which of these is NOT a characteristic of Redis caching?

In-memory storage for sub-millisecond latency
Single-threaded architecture for atomic operations
Automatic disk-based persistence by default
Supports multiple data structures (strings, hashes, sets, sorted sets)

Answer

Automatic disk-based persistence by default is NOT a characteristic of Redis. Redis is primarily an in-memory cache. While it does support persistence options (RDB snapshots and AOF - Append Only File), it doesn't automatically persist to disk by default. Data is lost if Redis crashes without persistence configured.

Redis is in-memory (fast), single-threaded (no locking overhead), and supports multiple data structures (strings, hashes, sets, sorted sets, lists).

Q6: Facebook uses multiple cache layers (edge, application, database, client). Why?

Because they couldn't decide on one caching strategy
To reduce complexity
Each layer serves a different purpose with appropriate TTLs
All cache layers do the same thing

Answer

Each layer serves a different purpose with appropriate TTLs. Facebook's caching strategy:

Edge cache (CDN): Static assets with long TTLs (images, CSS, JS) - 1 day to 1 year
Application cache (Tao/Memcached): Dynamic data with medium TTLs (friend lists, posts) - 5-60 minutes
Database cache: Hot rows and query results with short TTLs - 1-10 minutes
Client cache: Browser and Service Worker cache with controlled TTLs

This multi-layer approach gives them both performance (edge cache) and freshness (shorter TTLs at application layer) while optimizing for different access patterns at each layer.

Q7: What's the trade-off with Write-Back (Write-Behind) caching?

Slower reads
Risk of data loss if cache crashes before syncing
Complex read logic
Higher memory usage

Answer

Risk of data loss if cache crashes before syncing is the main trade-off with Write-Back caching. In Write-Back, writes go to cache immediately (fast) and the cache asynchronously writes to the database. If the cache crashes before syncing, those writes are lost forever.

Write-Back is very fast for writes and can batch database operations, but the data loss risk makes it unsuitable for critical data. Use it for analytics events, clickstream data, or other data where occasional loss is acceptable.

Q8: You're building an e-commerce platform with 1M products. 90% of traffic is browsing (read-heavy). Which caching strategy is best?

Write-Back
Write-Through
Cache-Aside with LRU eviction
No caching needed

Answer

Cache-Aside with LRU eviction is ideal for this scenario. Here's why:

Cache-Aside - Only cache frequently accessed products (efficient memory usage, don't waste space on cold products)
LRU eviction - Popular browsing products stay in cache (good temporal locality)
Read-heavy workload - 90% of traffic is browsing, so cache hit rate will be high (90%+)

Write-Through would cache every product on write (inefficient). Write-Back has data loss risk (not acceptable for product data). Cache-Aside is perfect: cache the hot products (20% of products get 80% of traffic) and let LRU evict cold products.

Next Steps

Now that we understand caching strategies, eviction policies, and real-world implementations, let's put it all together with a comprehensive system design exercise.

👉 Complete Module 2: The Building Blocks

Up next: Module 3: Advanced Modeling - Learn about system boundaries, context mapping, and more complex architectural patterns!

Keyboard shortcuts

Sruja – Context engineering for the AI era.