Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lesson 2: The Vocabulary of Scale

To design big systems, you need to speak the language.

1. Scaling: Up vs Out

When your website crashes because too many people are using it, you have two choices.

Vertical Scaling (Scaling Up)

"Get a bigger machine." You upgrade from a 4GB RAM server to a 64GB RAM server.

  • Pros: Easy. No code changes.
  • Cons: Expensive. Finite limit (you can't buy a 100TB RAM server... easily). Single point of failure.

Horizontal Scaling (Scaling Out)

"Get more machines." You buy 10 cheap servers and split the traffic between them.

  • Pros: Infinite scale (google has millions of servers). Resilient (if one dies, others take over).
  • Cons: Complex. You need load balancers and data consistency strategies.
graph TD
    subgraph Vertical [Vertical Scaling]
        Small[Server] -- Upgrade --> Big[SERVER]
    end

    subgraph Horizontal [Horizontal Scaling]
        One[Server] -- Add More --> Many1[Server]
        One -- Add More --> Many2[Server]
        One -- Add More --> Many3[Server]
    end

2. Speed: Latency vs Throughput

In interviews, never just say "it needs to be fast". Be specific.

  • Latency: The time it takes for one person to get a result.
    • Metaphor: The time it takes to drive from A to B.
    • Unit: Milliseconds (ms).
  • Throughput: The number of people the system can serve at the same time.
    • Metaphor: The width of the highway (how many cars per hour).
    • Unit: Requests per Second (RPS).

Tip

Use the right word: A system can have low latency (fast response) but low throughput (crashes if 5 people use it). A highway can have high throughput (10 lanes) but high latency (traffic jam).

3. Sruja in Action

Sruja allows you to define horizontal scaling requirements explicitly using the scale block.

import { * } from 'sruja.ai/stdlib'


ECommerce = system "E-Commerce System" {
    WebServer = container "Web App" {
        technology "Rust, Axum"

        // Explicitly defining Horizontal Scaling
        scale {
            min 3            // Start with 3 servers
            max 100          // Scale up to 100
            metric "cpu > 80%"
        }
    }

    Database = database "Primary DB" {
        technology "PostgreSQL"
        // Describing Vertical Scaling via comments/description
        description "Running on a massive AWS r5.24xlarge instance (Vertical Scaling)"
    }

    WebServer -> Database "Reads/Writes"
}

view index {
include *
}

Knowledge Check

Q: Why don't we just vertically scale forever?

Because physics. There is a limit to how fast a single CPU can be. Also, if that one super-computer catches fire, your entire business is dead.

Quiz: Test Your Knowledge

Ready to apply what you've learned? Take the interactive quiz for this lesson!

1. What type of scaling involves upgrading a single machine with more resources (more RAM, CPU, disk space)?

Click to see answer

Answer: Vertical

Alternative answers:

  • vertical scaling
  • scale up
  • scale-up

Explanation: Vertical scaling (or scaling up) means making a single machine more powerful. Example: Upgrading from 4GB RAM to 64GB RAM on one server.


2. What type of scaling involves adding more machines to distribute the load?

Click to see answer

Answer: Horizontal

Alternative answers:

  • horizontal scaling
  • scale out
  • scale-out

Explanation: Horizontal scaling (or scaling out) means adding more machines to handle increased load. Example: Adding 10 servers instead of upgrading one server to be more powerful.


3. Why don't we just vertically scale forever to handle all growth?

  • a) Vertical scaling is always more expensive than horizontal scaling
  • b) Vertical scaling requires more maintenance and monitoring
  • c) Vertical scaling is only available on cloud platforms
  • d) There are physical limits to how powerful a single machine can be, and it's a single point of failure


4. Your application needs to handle 10x traffic during a holiday sale (from 100K to 1M users per hour). You have 2 weeks to prepare. What's the best approach?

  • a) Vertically scale by buying the most powerful server available (it handles 2M users)
  • b) Rewrite the entire application to be microservices-based
  • c) Tell users the site will be slow during the sale
  • d) Implement horizontal scaling with auto-scaling groups that can add servers automatically based on load


5. A startup has a monolithic application running on a single server. They expect to grow from 100 to 10,000 users over the next year. What's their best scaling strategy?

  • a) Immediately migrate to microservices architecture on Kubernetes
  • b) Start with 100 servers to prepare for future growth
  • c) Do nothing and hope the single server handles the load
  • d) Start with vertical scaling, then migrate to horizontal scaling when needed


6. What's the main disadvantage of horizontal scaling?

  • a) It's more expensive than vertical scaling
  • b) It has a finite limit to how much you can scale
  • c) It can't handle traffic spikes
  • d) It introduces complexity in data consistency, load balancing, and distributed systems management


7. You're designing a high-frequency trading system where every microsecond matters. Which scaling approach is most appropriate?

  • a) Horizontal scaling across multiple datacenters worldwide
  • b) Caching everything and accepting stale data
  • c) No scaling needed, as HFT systems don't handle much traffic
  • d) Vertical scaling on a single machine in the same datacenter as the stock exchange


8. What term describes the time it takes for a single request to complete, measured in milliseconds?

Click to see answer

Answer: Latency

Alternative answers:

  • response time
  • latency

Explanation: Latency is the time from when a request is sent to when the response is received. Think of it as the time it takes to drive from point A to point B.


9. What term describes how many requests a system can handle simultaneously, measured in requests per second (RPS)?

Click to see answer

Answer: Throughput

Alternative answers:

  • throughput
  • capacity
  • concurrent requests

Explanation: Throughput is the volume of work a system can handle. Think of it as the width of a highway—how many cars can travel per hour.


10. Can a system have low latency but low throughput?

  • a) No, low latency always means high throughput
  • b) No, these terms are synonyms
  • c) Only in distributed systems
  • d) Yes—a single-lane road has low latency (no traffic jam) but low throughput (few cars per hour)


11. YouTube must serve videos to millions of users simultaneously. What's the most important metric for their success?

  • a) Low latency for video upload
  • b) High throughput for video streaming
  • c) Strong consistency for user preferences
  • d) High throughput for video streaming with acceptable latency for video start


12. A REST API averages 50ms latency but can only handle 100 requests/second before becoming unresponsive. You need to support 1,000 requests/second. What's the first step?

  • a) Optimize code to reduce latency from 50ms to 5ms
  • b) Increase the timeout to handle more concurrent requests
  • c) Add caching for everything
  • d) horizontally scale by running multiple instances behind a load balancer


13. Google Search needs to return results in under 500 milliseconds for 63,000 queries per second. What's their architectural approach?

  • a) One supercomputer with infinite RAM
  • b) Caching everything for 24 hours
  • c) Accepting slower response times during peak hours
  • d) Horizontal scaling with distributed computing, pre-computed indexes, and edge caching


14. Your database has a read-to-write ratio of 1000:1 (users read data 1000x more than they write it). What scaling strategy is most effective?

  • a) Add more powerful CPUs for write operations
  • b) Shard the database based on write patterns
  • c) Optimize write queries since they're the bottleneck
  • d) Use read replicas to distribute read load across multiple database copies


15. When should you choose vertical scaling over horizontal scaling?

  • a) When you need to handle millions of concurrent users
  • b) When your application has no shared state and is stateless
  • c) When cost and complexity are not concerns
  • d) When you need a quick solution, have low traffic, or your application has complex shared state


16. What component distributes incoming network traffic across multiple servers to enable horizontal scaling?

Click to see answer

Answer: Load balancer

Alternative answers:

  • load balancer
  • load balancers
  • LB
  • proxy

Explanation: Load balancers are the "traffic cops" that distribute requests across multiple servers, enabling horizontal scaling and providing resilience by routing around failed servers.


17. In a horizontally scaled system, what happens if one server fails?

  • a) The entire system crashes
  • b) All traffic stops until the server is repaired
  • c) The load balancer sends more traffic to the failed server
  • d) The load balancer stops sending traffic to the failed server and routes it to the remaining healthy servers


18. Sruja allows you to define horizontal scaling explicitly. What does min: 3, max: 100, metric: "cpu > 80%" mean in a scale block?

  • a) Always run exactly 3 servers, maximum CPU usage 80%
  • b) Scale vertically when CPU is under 80%
  • c) Never scale down below 3 servers regardless of CPU
  • d) Start with 3 servers, add more up to 100 when CPU exceeds 80%, remove servers when CPU is below threshold


19. Your e-commerce site's product catalog page loads in 2 seconds, but during a sale it slows to 10 seconds. Which metric degraded?

  • a) Throughput decreased
  • b) The database ran out of storage
  • c) The page size increased
  • d) Latency increased (response time got worse) due to increased load on the system


20. A system has 99.9% uptime, meaning it can be down for about 8.77 hours per year. If you want 99.99% uptime, how much downtime is acceptable per year?

  • a) 8.77 hours (same as 99.9%)
  • b) 1 hour
  • c) 1 minute
  • d) About 52.6 minutes (8.77 hours / 10)


This quiz covers:

  • Vertical vs Horizontal scaling strategies
  • When to use each scaling approach
  • Latency vs Throughput concepts
  • Real-world scaling scenarios (YouTube, Google, HFT)
  • Load balancing and auto-scaling
  • Practical scaling decisions

Next Steps

We have the mindset, and we have the words. Now let's draw. 👉 Lesson 3: The C4 Model (Visualizing Architecture)