Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lesson 4: CAP Theorem & Consistency

The CAP Theorem

Proposed by Eric Brewer, the CAP theorem states that a distributed data store can only provide two of the following three guarantees:

  1. Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.
  2. Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
  3. Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.

The Reality: P is Mandatory

In a distributed system, network partitions (P) are inevitable. Therefore, you must choose between Consistency (CP) and Availability (AP) when a partition occurs.

  • CP (Consistency + Partition Tolerance): Wait for data to sync. If a node is unreachable, return an error. (e.g., Banking systems).
  • AP (Availability + Partition Tolerance): Return the most recent version of data available, even if it might be stale. (e.g., Social media feeds).

Consistency Models

  • Strong Consistency: Once a write is confirmed, all subsequent reads see that value.
  • Eventual Consistency: If no new updates are made, eventually all accesses will return the last updated value. (Common in AP systems).

🛠️ Sruja Perspective: Documenting Guarantees

When defining data stores in Sruja, it is helpful to document their consistency guarantees, especially for distributed databases.

import { * } from 'sruja.ai/stdlib'


DataLayer = system "Data Layer" {
    UserDB = database "User Database" {
        technology "Cassandra"
        // Explicitly stating the consistency model
        description "configured with replication factor 3. Uses eventual consistency for high availability."

        // You could also use custom tags
        tags ["AP-System", "Eventual-Consistency"]
    }

    BillingDB = database "Billing Database" {
        technology "PostgreSQL"
        description "Single primary with synchronous replication to ensure strong consistency."
        tags ["CP-System", "Strong-Consistency"]
    }
}

view index {
include *
}

Quiz: Test Your Knowledge

Ready to apply what you've learned? Take the interactive quiz for this lesson!

1. What term in the CAP theorem means every read receives the most recent write or an error (all nodes see the same data)?

Click to see answer

Answer: Consistency

Alternative answers:

  • consistency
  • C

Explanation: Consistency ensures all nodes see the same data at the same time. When a write is confirmed, any subsequent read returns that value.


2. What term in the CAP theorem means every request receives a non-error response, without guaranteeing it contains the most recent write?

Click to see answer

Answer: Availability

Alternative answers:

  • availability
  • A

Explanation: Availability means the system is always responsive. Even if some nodes are out of sync, the system returns a response (possibly stale data) rather than an error.


3. What term in the CAP theorem means the system continues to operate despite network failures or message loss between nodes?

Click to see answer

Answer: Partition Tolerance

Alternative answers:

  • partition tolerance
  • partition-tolerance
  • P

Explanation: Partition Tolerance ensures the system works even when network communication between nodes fails. In distributed systems, partitions are inevitable, so P is mandatory.


4. In a distributed system, you must choose between CP and AP when a network partition occurs. Why?

  • a) Because P is optional in most systems
  • b) Because you can only implement two of the three guarantees simultaneously
  • c) Because C and A are mutually exclusive by definition
  • d) Because network partitions (P) are inevitable in distributed systems


5. A banking system must ensure that account balances are always correct. During a network partition, the system rejects transactions if it can't confirm data consistency. This is what type of system?

  • a) AP (Available) - better to allow transactions with possibly incorrect balances
  • b) CA (Consistent and Available) - possible in single-node systems only
  • c) P (Partition Tolerance) only - data consistency isn't important
  • d) CP (Consistent and Partition Tolerant)


6. A social media feed shows posts from friends. If a partition occurs, users see slightly outdated posts rather than an error page. This is what type of system?

  • a) CP (Consistent) - reject requests if data isn't perfectly synced
  • b) CA (Consistent and Available) - impossible in distributed systems
  • c) P (Partition Tolerance) only - doesn't describe the full trade-off
  • d) AP (Available and Partition Tolerant)


7. Which system would prioritize CP (Consistency) over Availability?

  • a) Instagram photo feed
  • b) YouTube video recommendations
  • c) E-commerce product catalog (non-critical)
  • d) PayPal payment processing


8. Which system would prioritize AP (Availability) over Consistency?

  • a) Banking transaction system
  • b) Inventory management for critical medical supplies
  • c) Stock trading platform for high-frequency trading
  • d) Netflix video streaming recommendations


9. What is the difference between Strong Consistency and Eventual Consistency?

  • a) Strong Consistency is slower, Eventual is always faster
  • b) Strong Consistency allows stale reads, Eventual doesn't
  • c) There's no difference, they're synonyms
  • d) Strong Consistency: all nodes see same data immediately. Eventual: nodes eventually converge if no new writes occur.


10. A user posts a tweet. The tweet immediately appears in their own timeline but takes up to 30 seconds to appear in followers' feeds. What consistency model is this?

  • a) Strong Consistency (everyone sees the tweet immediately)
  • b) No consistency (data is randomly shown or hidden)
  • c) CP system (rejects posts during partitions)
  • d) Eventual Consistency (followers eventually see the tweet)


11. A database has a replication factor of 3 (3 copies of data). Writes are confirmed after writing to 2 nodes. If one node is down, what happens?

  • a) Write fails because all 3 nodes must be available
  • b) Write succeeds because only 2 nodes are required (quorum)
  • c) System becomes completely unavailable
  • d) Write succeeds because quorum (2 out of 3 nodes) is available


12. Cassandra is a distributed database designed for AP systems. If you need strong consistency in Cassandra, what configuration would you use?

  • a) Replication factor of 1 (single node)
  • b) Read/write with QUORUM consistency level
  • c) Set consistency level to ONE (fast but weak)
  • d) Read/write with QUORUM consistency level (majority of replicas)


13. You're designing a global e-commerce platform with product catalogs, user sessions, and order processing. Which should use strong consistency?

  • a) Product catalog (catalog changes are frequent)
  • b) User sessions (session data isn't critical)
  • c) Search results (stale results are acceptable)
  • d) Order processing and inventory management


14. In a globally distributed system, network latency between regions is 200ms. If you need strong consistency, what's the minimum write latency?

  • a) 0ms (write happens locally and asynchronously replicates)
  • b) 50ms (compression reduces latency)
  • c) 200ms (only need to write to one region)
  • d) 400ms+ (must wait for acknowledgment from majority of regions)


15. What's the difference between BASE (Basically Available, Soft state, Eventual consistency) and ACID (Atomicity, Consistency, Isolation, Durability)?

  • a) BASE is stricter than ACID, requiring perfect consistency
  • b) They're synonyms, just different names for the same concept
  • c) BASE is for distributed systems, ACID is only for single-node databases
  • d) BASE is an alternative to ACID that prioritizes availability over strong consistency


16. A read-after-write consistency model ensures that a client always sees their own writes. What consistency level is this?

  • a) Weak Consistency (writes might not be visible)
  • b) Strong Consistency (all clients see all writes immediately)
  • c) Eventual Consistency (eventually converges)
  • d) Causal Consistency (session consistency)


17. In Sruja, how would you document that a database uses eventual consistency for high availability?

  • a) Use a 'relaxed' tag in the relationship
  • b) Don't model it - Sruja assumes strong consistency by default
  • c) Create two databases and say they're 'somewhat consistent'
  • d) Add tags like 'AP-System' and 'Eventual-Consistency' to the database definition


18. A system needs 99.999% availability but can tolerate 5-second data staleness. What's the best approach?

  • a) Strong consistency with synchronous replication across all nodes
  • b) Single-node database (simplest, no network issues)
  • c) Reject all writes during network partitions
  • d) Eventual consistency with asynchronous replication and caching


19. What happens when a CP system experiences a network partition?

  • a) The system continues serving all requests with slightly stale data
  • b) The system becomes completely unavailable (no data can be read or written)
  • c) The system switches to AP mode automatically
  • d) The system rejects requests that can't be guaranteed to be consistent (returns errors)


20. A distributed database has 5 nodes. Network partition splits them: 3 nodes in group A, 2 nodes in group B. What happens in a CP system?

  • a) Both groups accept writes (both available)
  • b) Group B accepts writes (it's smaller, so it's backup)
  • c) Neither group can operate (complete system failure)
  • d) Only group A (3 nodes, majority) can accept writes. Group B is read-only or unavailable.


21. What's the relationship between latency and consistency in distributed systems?

  • a) Strong consistency always has lower latency than eventual consistency
  • b) Eventual consistency always has lower latency, regardless of design
  • c) Latency and consistency are independent - no relationship exists
  • d) Strong consistency typically requires more coordination, resulting in higher latency


This quiz covers:

  • CAP theorem definitions (Consistency, Availability, Partition Tolerance)
  • Why P is mandatory in distributed systems
  • CP vs AP systems
  • Real-world examples (PayPal, Netflix, Twitter)
  • Strong vs Eventual Consistency
  • Consistency levels in Cassandra (ONE, QUORUM, ALL)
  • Replication and quorum
  • Global distributed systems and latency
  • BASE vs ACID
  • Causal consistency
  • Network partitions and quorum
  • Latency vs consistency trade-offs
  • Sruja modeling for consistency