#7 System Design: SQL Vs NO SQL

#7 System Design: SQL Vs NO SQL

Have you ever been in a situation to choose between SQL and NO SQL database ?

Whether you did or did not... Let's take a look at what I've got today to help us understand what their duties are and what their benefits and drawbacks are based on the use cases we'll encounter everyday.

SQL Databases:

We all know that SQL is all about rows and columns of a table that can be related to other table via relationships like foreign keys. They also have primary keys which are unique identifiers to find a single row in a table.

Table associations are critical for efficiently organizing and arranging a large amount of data.SQL is a query language that is both powerful and simple to understand when it comes to retrieving data.

Structure:

Data is structured by SQL schemas, which specify the columns and tables . Before saving anything, the data model and format must be known, which reduces the chance for error.

When compared to NoSQL databases, SQL databases take longer to start up since columns and tables must be built ahead of time. Unstructured data cannot be stored or queried in SQL databases (where the format is unknown).

ACID

Another benefit of SQL databases is that they are ACID (Atomicity, Consistency, Isolation, Durability) compliant, and they ensure that by supporting transactions. SQL transactions are groups of statements that are executed atomically. This means that they are either all executed, or not executed at all if any statement in the group fails. A easy example of a SQL transaction is written below

BEGIN TRANSACTION transfer_money_2000;
UPDATE AccountBalance SET balance = balance - 25 WHERE account_id = 10;
UPDATE AccountBalance SET balance = balance + 25 WHERE account_id = 200;
COMMIT TRANSACTION;

$25 is being transferred from one account balance to another.

If something happened, like network crash after deducting $25 from the first account 10 before it got added to account 200 then the entire transaction should roll back. Else it would be darn hard situation.

Scalability:

SQL databases are challenging to scale horizontally due to their relational nature. Adding numerous read-only replicas (with leader-follower replication) is simple for read-heavy systems, but for write-heavy systems, the only alternative is to vertically scale the database up, which is generally more expensive than provisioning more servers.

Leader Follower Replication:

image.png

A trade-off is made between consistency and availability by raising the number of read copies.Having more read servers increases availability, but at the cost of data consistency (assuming asynchronous updates), as there is a greater possibility of accessing outdated data. This was discussed well in the previous series of "CAP Theorem" image.png

NOSQL Databases:

Table relationships are not supported by NoSQL databases, therefore data is typically saved in documents or as key-value pairs. NoSQL databases are therefore more flexible, easier to set up, and a better solution for storing unstructured data.

Horizontal Scalability:

Without table relationships, data in NoSQL databases can be sharded across different data stores, allowing for distributed databases. This makes horizontal scaling much easier, and very large amounts of data can be stored without having to purchase a single, expensive server.

NoSQL databases can flexibly support both read-heavy and write-heavy systems. With data spread out across multiple shards/servers, hashing and consistent hashing are very important techniques for determining which shard(s) to route application queries to. image.png

Mongo DB uses a query router, which is a reverse proxy that accepts a query and routes it to the appropriate shard. After processing , the response is sent back to the caller/application.

This sounds like a job of an load balancer !!!

Yes it is :)

Consistency:

image.png

NoSQL databases are typically designed for distributed use cases, and write-heavy systems can be supported by having multiple write shards for the same data partition (called peer-to-peer replication).

However, the tradeoff is a loss of strong consistency.

After a write to a shard in a distributed NoSQL cluster, there’ll be a small delay before that update can be propagated to other replicas. During this time, reading from a replica can result in accessing stale data.

This weakness of the data eventually being up-to-date, a.k.a eventual consistency, was actually seen earlier with leader-follower replication (which can be used for SQL or NoSQL). Eventual consistency isn’t exactly a fault of NoSQL databases, but distributed databases in general. A single shard NoSQL database can be strongly consistent, but to fully take advantage of the scalability benefits of NoSQL, the database should be set up as a distributed cluster.

Examples of Popular NoSQL Databases: MongoDB, Redis, DynamoDB, Cassandra, CouchDB

Challenge Yourself:

Situation 1 :

Imagine you are building a application that will be accessed by millions of shoppers around the world. You need to store the data that the user has viewed previously. It's okay if the data is slightly out of date but since the customers are viewing lot of data you need to store this data in a database. What database will you use?

Answer : NO SQL since the data can be bit stale and that there is lot of purchase data to store.

Situation 2:

The engineering team has been constructing a lot of new REST services, and many of them send read requests to the primary SQL database regarding customer metrics, causing some scaling issues. To offload some of these read queries, the idea is to create a cache service that sits in front of the database. If the query hasn't been cached in the recent 10 minutes, this cache service should seek it up and query the database for the result, which it should go to cache as well. Otherwise, the cache service will just retrieve the result from the cache rather than searching the main database. What kind of database should the cache be built on?

Answer : NO SQL Because database should be able to look up the cached result by the SQL query (string) and the stored data is non-relational.

Situation 3:

The goal is to build a service at a Mortgage Company that can allow users to apply for loans through the app. The service needs a database to store loan applications. In addition to the loan amount, the application also needs information regarding the user’s current balance and prior transaction history. What kind of database should be used?
This is a financial application where data consistency is very important. Data about the loan, user’s balance and transaction history all need to be stored, and there’s relationships between these data.

Solution: SQL

Summary:

image.png

image.png

Did you find this article valuable?

Support Aishwarya Ravichandran by becoming a sponsor. Any amount is appreciated!