Distributed Systems: Sharding, Replication, and CAP Theorem
How do we make databases (and other applications) 1) scale to handle a large number of requests, and 2) be fault-tolerant? The answer is sharding (not sharting! 🤣) and replication, however there are subtleties and trade-offs, which can be explained from the perspective of the CAP theorem.
Sharding
Again, this isn’t the same as sharting! Please don’t shart in an attempt to make your database scale!
Ok, but seriously, one straight forward way to make a database scale and handle a large number of requests is to split your data into several databases, each on a different server! This is called sharding. You assign the server that a specific piece of data (row) goes to based on some criteria (e.g. based on ID, alphabetically, etc).
Server 1 Server 2
+-------------------+ +-------------------+
| Database 1 | | Database 2 |
| | | |
| Users A-M | | Users N-Z |
| | | |
+-------------------+ +-------------------+
For example, if you have a database of users (where each row is a user along with some associated info), you can split the rows amongst several databases, hosted in several servers, based on some criteria of each row, for example, the first letter of the user’s name. This way, you would have database1 in server1 be responsible for storing users A-M, and database2 in server2 be responsible for storing users N-Z. Of course, you can split the data amongst more servers, and based on different criteria, but the core concept remains: split the data amongst several databases/servers based on some criteria of each row.
Replication
By putting copies of your database on multiple servers, you can distribute requests amongst these servers, thus handle a large number of requests. Furthermore, if one server fails, the other servers can take over. This provides you both scalability and fault-tolerance! This is called replication.
+------------+ +------------+ +------------+
| | | | | |
| Server 1 | | Server 2 | | Server 3 |
| Database | | Database | | Database |
| | | | | |
+------------+ +------------+ +------------+
But there is a subtlety here. What happens when you need to write? If you write to one server, you need to write to all, otherwise certain servers will have outdated data, and they will provide this outdated data to clients making requests to them. You have a couple of options here.
Eventual Consistency
You can start the write process, and then return to the client, without waiting for the write to be completed on all servers. This means that between the time you return and the time the write is completed on all servers, some servers will have the old data, and they will return this old data to clients. This is called eventual consistency. Because eventually, all servers will have the new data, but there is a period of time where some servers after a write will have stale data. You just have to make sure clients know this and deal with it. Generally, you make a time guarantee to clients, for example, “the data will be consistent within 5 seconds”. This helps clients because they know what to expect.
Strong Consistency (Definition)
The above is eventual consistency, which is a form of weak consistency. This is in contrast to strong consistency. A strong consistency guarantee means that after a write is committed, all subsequent reads, regardless of what server they go to, will get the updated value.
Strong Consistency (Fake)
One way to achieve strong consistency is to wait for the write to be completed on all servers before proceeding with processing any subsequent requests. But this means that all requests have to funnel through a single point of failure. This point of failure is responsible for determining whether the request is a read or write. If the request is a read, it will delegate to one of the servers. If the request is a write, it will stop processing requests until the write is completed on all servers. This would get you strong consistency, however this is impractical because it is a single point of failure.
+--------------------------------+
| Single Point of Failure Server | # all requests funnel through here
+--------------------------------+
|
+---------------------------+
| | |
v v v
+------------+ +------------+ +------------+
| | | | | |
| Server 1 | | Server 2 | | Server 3 |
| Database | | Database | | Database |
| | | | | |
+------------+ +------------+ +------------+
Strong Consistency (Real)
The true way to have strong consistency is to use a distributed consensus algorithm, such as Paxos or Raft. These algorithms allow a group of servers to agree on a value, even if some of the servers are down. Here’s how it works.
When a write is made, the consensus algorithm talks to the servers until a majority of the servers agree, some of the servers can be down. Once a majority of the servers agree, the write is considered committed. After this, all subsequent reads will get the updated value. Period.
If a server was down while the majority agreement and commitment happened, as soon as it gets back up the first thing it will do is ask the other servers if it is behind on data, and if so to catch it up. It does not process any requests until it is caught up. This ensures that even servers that were down during the consensus, will have the updated value upon reboot.
CAP Theorem
The CAP theorem states that a distributed system can only have two of the following three properties: consistency, availability, and partition tolerance.
- Consistency: All nodes see the same data at the same time. In other words, a read will return the most recent write.
- Availability: Every request gets a response (even if it isn’t the most recent write).
- Partition Tolerance: The system continues to operate even if network connectivity between some nodes is lost.
To be more precise, you can only have 100% of two of these properties. You must sacrifice a certain percentage of one of these properties.
For example, our eventual consistency model sacrifices a bit of consistency. Sometimes requests return stale data. But if you give it enough time, this gets resolved (for the previous write).
The consensus algorithm we talked about sacrifices a bit of availability and even some partition tolerance. We know that in order for a write to be commited, a majority of the servers must agree, so in order for a write to succeed, a majority of the servers must able to communicate. So you can have some network partition, but more than half the nodes must be connected, this is a reduction of partition tolerance. Furthermore, this is a reduction of availability because if a majority of the servers are down, no writes can be made. So the consensus algorithms get strong consistency at the cost of some availability and partition tolerance.
Generalizing to All Applications
We spoke of these concepts in the context of databases, but they apply to all distributed systems. Instead of your data being in a database, they could be in files in the various servers, they could be in the memory of various processes, etc. It doesn’t matter, you have some data across multiple servers, in a distributed manner. The same concepts apply.
- you can shard (not shart! hahaha)
- you can replicate
- and choose to have eventual consistency
- or use a consensus algorithm for strong consistency but sacrifice some availability and partition tolerance
Summary
- the most straight forward way to scale a database is to split the data into multiple databases across multiple servers, based on some criteria of each row. This is called sharding.
- another way to scale is to put copies of your database on multiple servers. This is called replication. In addition to scalability, you gain fault-tolerance.
- however, you have to choose either eventual consistency
- or strong consistency, which requires a consensus algorithm, but sacrifices some availability and partition tolerance
- these concepts apply to all distributed systems, not just databases
- instead of database rows, you could have any data (files, data in memory, etc)
Thanks for stoppin by and hope it was a fun read!