Wednesday, July 19, 2023

System Design 101 - CAP Theorem

CAP Theorem - When you can not desire everything at once

This theorem states that any distributed system can only provide two of the following three guarantees simultaneously:
  • Consistency: all nodes in a distributed system see the same data at the same time
  • Availability: every request receives a response, without guarantee that it is up-to-date (RUN 24/7 - no downtime)
  • Partition tolerance: system continues to function even when network partitions occur. (Machines are in different network)

 

Suppose we have to design a simple system of 2 ATMs that link together. When the user withdraws from ATM1, the amount of money must update to ATM2 as well.

We cannot design a system that concludes CAP because, to have C (consistency), data must be updated at once between ATM1 and ATM2. But if the link between ATM1 and ATM2 is broken, when a user withdraws on ATM1, ATM2 will be out of date, and then C(consistency) is violated. To solve it, we have to stop ATM1 and ATM2 operations to make sure that all is available to update the data; this violates A (Availability) of the system.

Now, We engineers will have to trade off one element. In the case of the ATM, we have the following combinations:

  • CA: Not considering taking this case into account since in distributed systems, Partition tolerance is a must. Due to mis-synchronization, missing Partition tolerance in an ATM situation will cause Consistency to violate.
  • CP: Due to the constraint that Consistency is a must, Partition tolerance will make the system unavailable since it cannot guarantee consistency.
  • AP: Prefer Availability and Partition tolerance over Consistency. The ATMs always work, even when the connection between 2 ATMs is broken Partition tolerance. It's not consistent but can be solved later when 2 ATMs are connected again. So, this is the best solution.

Designing a system is not an easy task: vague, abstract, requires estimation, and requires thinking about something before it happens. In the real world, even after picking 2 elements out of 3, it's not always balance (50-50) between 2 factors. Sometimes, we have to trade off making one factor more important than the other to reduce complexity. Example: There is no 100% availability, and Partition tolerance may take time to become active and make synchonization.


#systemdesign101

No comments:

Post a Comment