Lesson 2: The Vocabulary of Scale
To design big systems, you need to speak the language.
1. Scaling: Up vs Out
When your website crashes because too many people are using it, you have two choices.
Vertical Scaling (Scaling Up)
"Get a bigger machine." You upgrade from a 4GB RAM server to a 64GB RAM server.
- Pros: Easy. No code changes.
- Cons: Expensive. Finite limit (you can't buy a 100TB RAM server... easily). Single point of failure.
Horizontal Scaling (Scaling Out)
"Get more machines." You buy 10 cheap servers and split the traffic between them.
- Pros: Infinite scale (google has millions of servers). Resilient (if one dies, others take over).
- Cons: Complex. You need load balancers and data consistency strategies.
graph TD
subgraph Vertical [Vertical Scaling]
Small[Server] -- Upgrade --> Big[SERVER]
end
subgraph Horizontal [Horizontal Scaling]
One[Server] -- Add More --> Many1[Server]
One -- Add More --> Many2[Server]
One -- Add More --> Many3[Server]
end
2. Speed: Latency vs Throughput
In interviews, never just say "it needs to be fast". Be specific.
- Latency: The time it takes for one person to get a result.
- Metaphor: The time it takes to drive from A to B.
- Unit: Milliseconds (ms).
- Throughput: The number of people the system can serve at the same time.
- Metaphor: The width of the highway (how many cars per hour).
- Unit: Requests per Second (RPS).
Tip
Use the right word: A system can have low latency (fast response) but low throughput (crashes if 5 people use it). A highway can have high throughput (10 lanes) but high latency (traffic jam).
3. Sruja in Action
Sruja allows you to define horizontal scaling requirements explicitly using the scale block.
import { * } from 'sruja.ai/stdlib'
ECommerce = system "E-Commerce System" {
WebServer = container "Web App" {
technology "Rust, Axum"
// Explicitly defining Horizontal Scaling
scale {
min 3 // Start with 3 servers
max 100 // Scale up to 100
metric "cpu > 80%"
}
}
Database = database "Primary DB" {
technology "PostgreSQL"
// Describing Vertical Scaling via comments/description
description "Running on a massive AWS r5.24xlarge instance (Vertical Scaling)"
}
WebServer -> Database "Reads/Writes"
}
view index {
include *
}
Knowledge Check
Q: Why don't we just vertically scale forever?
Because physics. There is a limit to how fast a single CPU can be. Also, if that one super-computer catches fire, your entire business is dead.
Quiz: Test Your Knowledge
Ready to apply what you've learned? Take the interactive quiz for this lesson!
1. What type of scaling involves upgrading a single machine with more resources (more RAM, CPU, disk space)?
Click to see answer
Answer: Vertical
Alternative answers:
- vertical scaling
- scale up
- scale-up
Explanation: Vertical scaling (or scaling up) means making a single machine more powerful. Example: Upgrading from 4GB RAM to 64GB RAM on one server.
2. What type of scaling involves adding more machines to distribute the load?
Click to see answer
Answer: Horizontal
Alternative answers:
- horizontal scaling
- scale out
- scale-out
Explanation: Horizontal scaling (or scaling out) means adding more machines to handle increased load. Example: Adding 10 servers instead of upgrading one server to be more powerful.
3. Why don't we just vertically scale forever to handle all growth?
- a) Vertical scaling is always more expensive than horizontal scaling
- b) Vertical scaling requires more maintenance and monitoring
- c) Vertical scaling is only available on cloud platforms
- d) There are physical limits to how powerful a single machine can be, and it's a single point of failure
4. Your application needs to handle 10x traffic during a holiday sale (from 100K to 1M users per hour). You have 2 weeks to prepare. What's the best approach?
- a) Vertically scale by buying the most powerful server available (it handles 2M users)
- b) Rewrite the entire application to be microservices-based
- c) Tell users the site will be slow during the sale
- d) Implement horizontal scaling with auto-scaling groups that can add servers automatically based on load
5. A startup has a monolithic application running on a single server. They expect to grow from 100 to 10,000 users over the next year. What's their best scaling strategy?
- a) Immediately migrate to microservices architecture on Kubernetes
- b) Start with 100 servers to prepare for future growth
- c) Do nothing and hope the single server handles the load
- d) Start with vertical scaling, then migrate to horizontal scaling when needed
6. What's the main disadvantage of horizontal scaling?
- a) It's more expensive than vertical scaling
- b) It has a finite limit to how much you can scale
- c) It can't handle traffic spikes
- d) It introduces complexity in data consistency, load balancing, and distributed systems management
7. You're designing a high-frequency trading system where every microsecond matters. Which scaling approach is most appropriate?
- a) Horizontal scaling across multiple datacenters worldwide
- b) Caching everything and accepting stale data
- c) No scaling needed, as HFT systems don't handle much traffic
- d) Vertical scaling on a single machine in the same datacenter as the stock exchange
8. What term describes the time it takes for a single request to complete, measured in milliseconds?
Click to see answer
Answer: Latency
Alternative answers:
- response time
- latency
Explanation: Latency is the time from when a request is sent to when the response is received. Think of it as the time it takes to drive from point A to point B.
9. What term describes how many requests a system can handle simultaneously, measured in requests per second (RPS)?
Click to see answer
Answer: Throughput
Alternative answers:
- throughput
- capacity
- concurrent requests
Explanation: Throughput is the volume of work a system can handle. Think of it as the width of a highway—how many cars can travel per hour.
10. Can a system have low latency but low throughput?
- a) No, low latency always means high throughput
- b) No, these terms are synonyms
- c) Only in distributed systems
- d) Yes—a single-lane road has low latency (no traffic jam) but low throughput (few cars per hour)
11. YouTube must serve videos to millions of users simultaneously. What's the most important metric for their success?
- a) Low latency for video upload
- b) High throughput for video streaming
- c) Strong consistency for user preferences
- d) High throughput for video streaming with acceptable latency for video start
12. A REST API averages 50ms latency but can only handle 100 requests/second before becoming unresponsive. You need to support 1,000 requests/second. What's the first step?
- a) Optimize code to reduce latency from 50ms to 5ms
- b) Increase the timeout to handle more concurrent requests
- c) Add caching for everything
- d) horizontally scale by running multiple instances behind a load balancer
13. Google Search needs to return results in under 500 milliseconds for 63,000 queries per second. What's their architectural approach?
- a) One supercomputer with infinite RAM
- b) Caching everything for 24 hours
- c) Accepting slower response times during peak hours
- d) Horizontal scaling with distributed computing, pre-computed indexes, and edge caching
14. Your database has a read-to-write ratio of 1000:1 (users read data 1000x more than they write it). What scaling strategy is most effective?
- a) Add more powerful CPUs for write operations
- b) Shard the database based on write patterns
- c) Optimize write queries since they're the bottleneck
- d) Use read replicas to distribute read load across multiple database copies
15. When should you choose vertical scaling over horizontal scaling?
- a) When you need to handle millions of concurrent users
- b) When your application has no shared state and is stateless
- c) When cost and complexity are not concerns
- d) When you need a quick solution, have low traffic, or your application has complex shared state
16. What component distributes incoming network traffic across multiple servers to enable horizontal scaling?
Click to see answer
Answer: Load balancer
Alternative answers:
- load balancer
- load balancers
- LB
- proxy
Explanation: Load balancers are the "traffic cops" that distribute requests across multiple servers, enabling horizontal scaling and providing resilience by routing around failed servers.
17. In a horizontally scaled system, what happens if one server fails?
- a) The entire system crashes
- b) All traffic stops until the server is repaired
- c) The load balancer sends more traffic to the failed server
- d) The load balancer stops sending traffic to the failed server and routes it to the remaining healthy servers
18. Sruja allows you to define horizontal scaling explicitly. What does min: 3, max: 100, metric: "cpu > 80%" mean in a scale block?
- a) Always run exactly 3 servers, maximum CPU usage 80%
- b) Scale vertically when CPU is under 80%
- c) Never scale down below 3 servers regardless of CPU
- d) Start with 3 servers, add more up to 100 when CPU exceeds 80%, remove servers when CPU is below threshold
19. Your e-commerce site's product catalog page loads in 2 seconds, but during a sale it slows to 10 seconds. Which metric degraded?
- a) Throughput decreased
- b) The database ran out of storage
- c) The page size increased
- d) Latency increased (response time got worse) due to increased load on the system
20. A system has 99.9% uptime, meaning it can be down for about 8.77 hours per year. If you want 99.99% uptime, how much downtime is acceptable per year?
- a) 8.77 hours (same as 99.9%)
- b) 1 hour
- c) 1 minute
- d) About 52.6 minutes (8.77 hours / 10)
This quiz covers:
- Vertical vs Horizontal scaling strategies
- When to use each scaling approach
- Latency vs Throughput concepts
- Real-world scaling scenarios (YouTube, Google, HFT)
- Load balancing and auto-scaling
- Practical scaling decisions
Next Steps
We have the mindset, and we have the words. Now let's draw. 👉 Lesson 3: The C4 Model (Visualizing Architecture)