#5 System Design: CAP Theorem

Backend Developer and a busy mom who loves technology and sharing knowledge that makes her fulfilling and happy
Trust me this is going to be super easy ! No Math :)
This is a very important database concept to understand in system design.
Will it teach me how to choose a database?
What I should consider while choosing one?
Why should I know this?
Hey ! slow down.. we are getting there !!!
Yes Yes and Yes.. It can help you narrow down the type of databases based on use case and deal with "fault tolerance".
In the distributed architectures, we always have more than one database on a network.
Since network failures ie "network partition" is unavoidable, its important to understand the impact of these failures beforehand at the time of design, when one of the database becomes unresponsive or crashes or is down .. blah .. blah....
offcourse ! these failures are temporary
But, we gotta still account for those upfront !!!
C - Consistency A - Availability P- Partition Tolerance
What does each one mean ?
Consistency:
It means that all the clients see the same data at the same time, no matter what database/node they are connecting to.
When data is written to one of the database/node it must be instantly replicated to other nodes in the network before the write is deemed successful.
Its All or nothing operation !!

As you see in the above picture the "write" is succeeded when there is "hello" on both the nodes. It fails if node A can't write to node B on the event of a network failure.
"No room for success" if one of the node fails. Prioritizing consistency !!!
Like the last benchers in a classroom.. either everyone passes the test or every one fails ;)
Availability
It means that any client making a "write" request will get a valid response(without errors) for sure even if there is a network failure or if one of the database/node is down.
No matter if you friend works hard or snoozes during an exam. You'll still give your best to pass the exam! ;)

As you saw above while writing to the database, the write was successful only to node A, and it could not be replicated to node B due to a network failure, but still we returned a valid response to the user.
Prioritizing Availability at the cost of consistency !
Did you mean to say if I connected both the databases. They don't have the same data.. no uniformity !!!?
Yes! for a while they will be stale, until the affected node is back healthy.
So if your situation doesn't care about the consistency of the data, then in that case you'll not have to say "Shop is closed" to your users even if one fails :)
Partition Tolerance:
A partition is a term used to describe communications break within distributed systems.
"Partition Tolerance" means that the cluster should continue to work despite these failures or communication breakdowns between the systems.
When should Consistency or Availability be prioritized?
If you’re working with data that you know needs to be up-to-date, then it may be better to store it in a database that prioritizes consistency over availability. On the flip side, if it’s fine that the queried data can be slightly out-of-date, then storing it in an available database may be the better choice
Read Requests
Why did we just speak about writes !!! How about reads ??
Well, read doesn't affect the state of the data as such.( unless you write and immediately read it during a network partition!!!). So it doesn't need re-syncing between the nodes.
Read requests are typically fine during network partitions for both consistent and available databases.
SQL Databases
Prioritizes Consistency :
SQL databases like MySQL, PostgreSQL, Microsoft SQL Server, Oracle, etc, usually prioritize consistency.
Master-slave replication is a common distributed architecture in SQL databases, and in the event of a master becoming unavailable, the role of master would failover to one of the replica nodes. During this failover process and electing a new master node, the database cannot be written to, so that consistency is preserved.
Prioritizes Availability :
Couch DB, Cassandra, AWS Dynamo DB etc
C Vs A Databases

Does Consistency in CAP mean Strong Consistency?
So does it mean that the replication is faster than lightening in "Consistent" databases so that connecting to read a data immediately after a write gives an accurate result?

In a strongly consistent database, if data is written and then immediately read after, it should always return the updated data. The problem is that in a distributed system, network communication doesn’t happen instantly, since nodes/servers are physically separated from each other and transferring data takes >0 time. This is why it’s not possible to have a perfectly, strongly consistent distributed database. In the real world, when we talk about databases that prioritize consistency, we usually refer to databases that are eventually consistent, with a very short, unnoticeable lag time between nodes.
Happy learning !!!!




