Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

  • Understand the difference between L4 and L7 load balancing
  • Learn load balancing algorithms and when to use each
  • See real-world examples from NGINX and HAProxy
  • Practice making trade-off decisions estimated_time: "20 minutes" difficulty: "beginner"

Lesson 1: Load Balancers

What is a Load Balancer?

A load balancer sits between clients and servers, distributing incoming network traffic across a group of backend servers. This ensures that no single server bears too much load.

Without a load balancer:

Clients → Server 1 (overwhelmed, 99% CPU)
         → Server 2 (idle, 5% CPU)
         → Server 3 (idle, 3% CPU)

With a load balancer:

Clients → Load Balancer → Server 1 (33% CPU)
                              → Server 2 (33% CPU)
                              → Server 3 (34% CPU)

Types of Load Balancing

Layer 4 (Transport Layer)

What it does: Makes decisions based on IP address and TCP/UDP ports.

Characteristics:

  • Fast - No content inspection, just reads packet headers
  • Low CPU - Simple routing logic
  • Limited intelligence - Can't inspect HTTP headers, URLs, or cookies
  • Protocol-agnostic - Works for HTTP, TCP, UDP, gRPC

Best for: High-throughput services where you need raw speed over intelligence.

Examples: AWS ELB (classic), HAProxy (L4 mode), F5 LTM

Layer 7 (Application Layer)

What it does: Makes decisions based on the content of the message (URL, HTTP headers, cookies, cookies).

Characteristics:

  • Slower - Needs to parse HTTP content
  • CPU intensive - More complex routing logic
  • Smart routing - Can route /images to image servers, /api to API servers
  • Content-aware - Inspects HTTP headers, cookies, SSL termination

Best for: Web applications needing content-based routing, SSL termination, or session stickiness.

Examples: NGINX, AWS ALB, Envoy, Traefik

Real-World Comparison

FactorLayer 4Layer 7
Speed100K+ RPS per core50K+ RPS per core
CPU Usage5-10%30-50%
FeaturesBasic routingSSL termination, URL routing, session affinity
Use CaseHigh-throughput APIsWeb apps, microservices

Load Balancing Algorithms

Round Robin

How it works: Requests are distributed sequentially across servers: 1, 2, 3, 4, then back to 1.

Pros:

  • ✅ Simple to implement and understand
  • ✅ Works well when servers have similar capacity
  • ✅ No state to maintain

Cons:

  • ❌ Doesn't account for server load (one might be slow with 100 connections, another idle)
  • ❌ Doesn't work well when servers have different capabilities

Real-world use: Good for stateless services where all servers are identical (e.g., API servers).

Least Connections

How it works: Sends request to the server with the fewest active connections.

Pros:

  • ✅ Better than round robin when servers have different loads
  • ✅ Accounts for connection time (long-running connections get fewer new ones)

Cons:

  • ❌ Still doesn't account for server capacity (CPU, memory)
  • ❌ Requires tracking active connections (more state)

Real-world use: Good for services with varying request processing times (e.g., database queries).

IP Hash

How it works: Uses a hash of the client's IP address to determine which server receives the request.

Pros:

  • Session stickiness - Same IP always goes to same server
  • ✅ Works great for stateful applications (session stored in memory)

Cons:

  • ❌ Uneven distribution if many clients share the same IP (NAT, proxies)
  • ❌ Doesn't rebalance when servers are added/removed

Real-world use: Stateful applications needing session affinity, WebSocket connections.

Random

How it works: Randomly selects a server for each request.

Pros:

  • ✅ No state to maintain
  • ✅ Even distribution over time (law of large numbers)

Cons:

  • ❌ Can have temporary uneven distribution
  • ❌ No session stickiness

Real-world use: Simple load balancing with large number of requests where stickiness doesn't matter.

Real-World Case Studies

Case Study 1: NGINX at Scale

The Challenge: NGINX (the company) needed to serve 1M+ requests per second (RPS) with low latency.

The Architecture:

Clients → Layer 4 Load Balancer (HAProxy) → NGINX Layer 7 LBs → Application Servers
         (10 Gbps)                          (100+ instances)         (1000+ servers)

Key Decisions:

  1. Layer 4 at edge - Raw speed for SSL termination and initial routing
  2. Layer 7 closer to apps - Content-based routing (/static vs /api)
  3. Least connections algorithm - Servers have varying loads

The Results:

  • Throughput: 1M+ RPS per load balancer
  • Latency: <10ms at p95
  • Server utilization: Balanced at 65-75% CPU (good headroom)
  • Availability: 99.99% with automatic failover

💡 Key Insight: NGINX uses a layered approach. Layer 4 for raw speed at the edge, Layer 7 for intelligence closer to the applications. This gives them both performance and features.

Case Study 2: HAProxy Configuration Patterns

Scenario: E-commerce platform with 3 types of traffic:

  • Product catalog (read-heavy, 90% of traffic)
  • Checkout (write-heavy, requires session stickiness)
  • API (stateless, needs high throughput)

The Solution:

# Product Catalog - Round Robin (read-heavy)
backend catalog
    balance roundrobin
    server web1 10.0.0.1:80 check
    server web2 10.0.0.2:80 check
    server web3 10.0.0.3:80 check

# Checkout - IP Hash (session stickiness)
backend checkout
    balance source
    server app1 10.0.0.10:8080 check
    server app2 10.0.0.11:8080 check

# API - Least Connections (varying request times)
backend api
    balance leastconn
    server api1 10.0.0.20:443 check
    server api2 10.0.0.21:443 check
    server api3 10.0.0.22:443 check

Why this works:

  • Catalog: Round robin works great for stateless, uniform requests
  • Checkout: IP hash ensures user stays on same server (session in memory)
  • API: Least connections handles varying query complexity (some API calls are fast, others slow)

Performance Results:

  • Catalog servers: 5000 RPS, 2ms latency
  • Checkout servers: 500 RPS, 20ms latency (acceptable for checkout flow)
  • API servers: 10000 RPS, 5ms latency (balanced load)

Case Study 3: WhatsApp's Load Balancing Evolution

Stage 1 (Early days): Single server → Crashed at 10K users

Stage 2: Round robin across 10 servers → Uneven load, some servers overwhelmed

Stage 3 (Solution): Erlang's built-in load balancing + consistent hashing

Clients → Load Balancer → Erlang Nodes → Message Store
                    (Consistent hashing for key-based routing)

Key Decision: Use consistent hashing so messages from the same user always go to the same node. This reduces cross-node synchronization.

The Results:

  • Scale: From 10K users to 1B+ users
  • Throughput: 65B+ messages/day
  • Efficiency: Minimal cross-node data movement (99% of traffic stays local)

Production Metrics

Load Balancer Performance

SystemAlgorithmThroughputLatency (p95)Servers
NGINXRound Robin1M+ RPS<10ms100
HAProxyLeast Connections500K RPS<5ms50
AWS ALBRound Robin100K RPS<50msManaged
EnvoyRandom + Health Checks1M+ RPS<20ms200

Resource Utilization

ResourceLayer 4 Load BalancerLayer 7 Load Balancer
CPU5-15%30-60%
Memory1-2 GB4-8 GB
Network10 Gbps10 Gbps
Throughput1M+ RPS500K RPS

Trade-Off Scenarios

Scenario 1: API Gateway for Microservices

Context: Building an API gateway that routes to 50 microservices. Some are fast (profile service), others slow (report generation).

The Trade-Off:

DecisionOption AOption BWhat You Choose & Why
LayerLayer 4 (fast)Layer 7 (smart)Layer 7 - Need URL-based routing (/users → users service)
AlgorithmRound RobinLeast ConnectionsLeast Connections - Services have varying response times
Health ChecksBasic TCPHTTP /healthHTTP /health - Need to detect slow/failing services, not just offline ones
SSLAt load balancerAt serviceAt load balancer - Centralized SSL management, cheaper certificates

Result:

  • Pros: Intelligent routing, good load balancing, centralized SSL
  • Cons: Higher CPU usage, more complex configuration
  • Performance: 50K RPS at p95 < 50ms (acceptable for API gateway)

Scenario 2: Video Streaming Platform

Context: Streaming video to 10M concurrent users. Each stream needs sustained bandwidth (2-5 Mbps). Low latency is critical.

The Trade-Off:

DecisionOption AOption BWhat You Choose & Why
LayerLayer 4 (speed)Layer 7 (features)Layer 4 - Raw speed for streaming, no content inspection needed
AlgorithmRound RobinIP HashRound Robin - Streams are independent, no session affinity needed
Geo-distributionSingle datacenterEdge locationsEdge locations - Reduce latency by serving from closest datacenter

Result:

  • Pros: Maximum throughput, minimal latency, simple configuration
  • Cons: No intelligent routing (but not needed for streaming)
  • Performance: 1M+ concurrent streams, <100ms latency globally

Scenario 3: Stateful WebSocket Application

Context: Real-time chat application where users are connected via WebSockets. User messages and presence data must go to the same server.

The Trade-Off:

DecisionOption AOption BWhat You Choose & Why
LayerLayer 4Layer 7Layer 7 - Need to inspect WebSocket upgrade requests
AlgorithmRound RobinIP HashIP Hash - Session stickiness required for WebSocket connections
FailoverBreak connectionsGraceful reconnectGraceful reconnect - Clients auto-reconnect on disconnect
PersistenceIn-memoryRedisIn-memory - Faster for local data, Redis for cross-server sync

Result:

  • Pros: Session affinity, real-time performance, good user experience
  • Cons: Uneven distribution (some servers have more active users), complexity in failover
  • Performance: 100K concurrent WebSockets, <50ms message delivery

🛠️ Sruja Perspective: Modeling Load Balancers

In Sruja, we treat load balancers as critical infrastructure components with clear trade-offs documented.

Why Model Load Balancers?

Modeling load balancers in your architecture provides:

  1. Capacity Planning: See how much traffic the LB can handle before bottlenecks
  2. Failure Analysis: Understand what happens if the LB fails (single point of failure?)
  3. Algorithm Clarity: Document which algorithm and why
  4. Performance Visibility: Track RPS, latency, server distribution

Example: E-Commerce Platform Load Balancing

import { * } from 'sruja.ai/stdlib'

ECommerce = system "E-Commerce Platform" {
    description "Multi-tenant e-commerce with intelligent load balancing"
    
    // LAYER 4: Edge load balancer for SSL termination
    EdgeLB = container "Edge Load Balancer" {
        technology "HAProxy"
        description "Layer 4 LB for SSL termination and initial routing"
        tags ["load-balancer", "layer4"]
        
        capacity {
            requests_per_second "1_000_000"
            bandwidth_gbps "10"
        }
    }
    
    // LAYER 7: Application load balancer for content routing
    AppLB = container "Application Load Balancer" {
        technology "NGINX"
        description "Layer 7 LB for URL-based routing and session affinity"
        tags ["load-balancer", "layer7"]
        
        tradeoff {
            decision "Use NGINX (Layer 7) for application routing"
            sacrifice "Raw throughput (L4 would be 2x faster)"
            reason "Need URL-based routing: /catalog → catalog servers, /checkout → checkout servers"
            mitigation "Use L4 edge LB for SSL termination to reduce NGINX load"
        }
        
        slo {
            latency {
                p95 "50ms"
                window "7 days"
            }
            availability {
                target "99.99%"
                window "30 days"
            }
        }
    }
    
    // BACKEND SERVICES
    CatalogServer = container "Catalog Service" {
        technology "Python, Django"
        description "Product catalog (read-heavy)"
        tags ["service"]
        quantity 10
    }
    
    CheckoutServer = container "Checkout Service" {
        technology "Node.js"
        description "Checkout flow (stateful, requires session affinity)"
        tags ["service"]
        quantity 5
    }
    
    APIServer = container "API Service" {
        technology "Go"
        description "Public API (stateless, high throughput)"
        tags ["service"]
        quantity 20
    }
    
    // TRAFFIC FLOW
    EdgeLB -> AppLB "Distributes traffic (Round Robin)"
    AppLB -> CatalogServer "Routes /catalog (Least Connections)"
    AppLB -> CheckoutServer "Routes /checkout (IP Hash for session affinity)"
    AppLB -> APIServer "Routes /api (Least Connections)"
}

view index {
    title "E-Commerce Load Balancing Architecture"
    include *
}

view load-balancing {
    title "Load Balancer Configuration"
    include ECommerce.EdgeLB ECommerce.AppLB
}

Key Trade-Offs Documented

1. Layer Choice:

  • Why Layer 4 at edge? Raw speed for SSL termination
  • Why Layer 7 at app? Need URL-based routing and session affinity

2. Algorithm Selection:

  • Round Robin: For catalog (stateless, uniform requests)
  • IP Hash: For checkout (requires session stickiness)
  • Least Connections: For API (varying query complexity)

3. Performance vs Features:

  • Sacrifice raw throughput for intelligent routing
  • Mitigated by using layered approach (L4 + L7)

Knowledge Check

Q: My app needs to route /images to image servers and /api to API servers. Which load balancing layer should I use?

Layer 7 (Application Layer)

Layer 7 load balancers can inspect HTTP content (URLs, headers) and route based on that. Layer 4 only looks at IP/port and can't do content-based routing.

Q: I'm building a WebSocket chat app. Users connect and stay connected for hours. Which algorithm should I use?

IP Hash (or consistent hashing)

You need session affinity - when a user connects, they should stay on the same server. IP hash ensures the same IP always routes to the same server. Round robin would break the WebSocket connection when requests go to different servers.

Q: I have 10 identical servers handling high-throughput API requests. Speed is the priority. What algorithm?

Round Robin

When servers are identical and stateless, round robin is perfect. It's simple, fast, and gives even distribution. No need for IP hash (stateful) or least connections (servers have similar load).

Quiz: Test Your Knowledge

Q1: What type of load balancing operates at the transport layer and uses IP addresses and ports?

  • Layer 4 (Transport Layer)
  • Layer 7 (Application Layer)
  • Layer 3 (Network Layer)
Answer **Layer 4 (Transport Layer)** operates at the transport layer and makes decisions based on IP addresses and TCP/UDP ports. Layer 7 works at the application layer and inspects HTTP content.

Q2: Which load balancing algorithm is best for applications requiring session stickiness?

  • Round Robin
  • Least Connections
  • IP Hash
Answer **IP Hash** uses the client's IP address to determine which server receives the request, ensuring the same client always goes to the same server. This is essential for session stickiness in stateful applications like WebSockets or applications with in-memory sessions.

Q3: You're building an API gateway routing to microservices with varying response times. Which algorithm should you use?

  • Round Robin
  • Least Connections
  • IP Hash
Answer **Least Connections** sends requests to the server with the fewest active connections. This is ideal when services have varying request processing times (some API calls are fast, others slow) because it accounts for actual server load rather than just distributing requests evenly.

Q4: Which of these is NOT a characteristic of Layer 7 load balancing?

  • Can inspect HTTP headers and URLs
  • Slower than Layer 4 due to content inspection
  • Cannot do SSL termination
  • More CPU intensive than Layer 4
Answer **Cannot do SSL termination** is NOT correct. Layer 7 load balancers CAN and commonly DO perform SSL termination. They terminate SSL at the load balancer, decrypt traffic, inspect HTTP content, then route it. This is one of their key features.

Q5: NGINX uses a layered approach with Layer 4 at the edge and Layer 7 closer to applications. Why?

  • To reduce complexity
  • To balance speed and intelligence
  • To minimize costs
  • To reduce latency
Answer **To balance speed and intelligence**. Layer 4 provides raw speed for SSL termination and initial routing (high throughput). Layer 7 provides intelligent features like URL-based routing closer to the applications. This layered approach gives you both performance (L4) and features (L7).

Next Steps

Now that we understand load balancing, let's learn about databases and how to choose between SQL and NoSQL. 👉 Lesson 2: Databases (SQL vs NoSQL, Replication, Sharding)