system-design-101/data/guides/a-crash-course-in-database-sharding.md at ee4b7305a2ccd6aa13688559d345e8a6647774de

mirror of https://github.com/ByteByteGoHq/system-design-101.git synced 2026-04-01 16:57:23 -04:00

Files

Kamran Ahmed ee4b7305a2 Adds ByteByteGo guides and links (#106 )

This PR adds all the guides from [Visual
Guides](https://bytebytego.com/guides/) section on bytebytego to the
repository with proper links.

- [x] Markdown files for guides and categories are placed inside
`data/guides` and `data/categories`
- [x] Guide links in readme are auto-generated using
`scripts/readme.ts`. Everytime you run the script `npm run
update-readme`, it reads the categories and guides from the above
mentioned folders, generate production links for guides and categories
and populate the table of content in the readme. This ensures that any
future guides and categories will automatically get added to the readme.
- [x] Sorting inside the readme matches the actual category and guides
sorting on production

2025-03-31 22:16:44 -07:00

3.2 KiB

Raw Blame History

title, description, image, createdAt, draft, categories, tags

title

description

image

createdAt

draft

What is Database Sharding?

Database sharding is a type of database partitioning that separates very large databases into smaller, faster, more easily managed parts called data shards. The word shard means a small piece of something bigger.

Sharding is also known as horizontal partitioning. Each shard contains a portion of the data, and all shards together contain all of the data.

Why Sharding?

Sharding is implemented to solve these problems:

Too much data on one machine: A single database server can only handle so much data.
Too many requests on one machine: A single database server can only handle so many requests.
High Latency: As data grows, query latency increases.

How Sharding Works

Sharding involves splitting a database into multiple, independent parts (shards) and distributing them across different servers or machines. Each shard contains a subset of the data, and all shards collectively hold the entire dataset.

Sharding Key

A sharding key is a column in the database table that determines how the data is distributed across the shards. The sharding key is used by the sharding algorithm to determine which shard a particular row of data should be stored in.

Sharding Algorithm

The sharding algorithm is the logic that determines which shard a particular row of data should be stored in. The sharding algorithm uses the sharding key to make this determination.

Here are some common sharding algorithms:

Range-based sharding: Data is divided into ranges based on the sharding key. For example, users with IDs from 1 to 1000 might be stored in shard 1, users with IDs from 1001 to 2000 in shard 2, and so on.
Hash-based sharding: A hash function is applied to the sharding key to determine the shard. For example, shard_id = hash(user_id) % num_shards.
Directory-based sharding: A lookup table (or directory) is used to map sharding keys to shard locations.

Sharding Approaches

Application-level sharding: The application is responsible for determining which shard to use for each query.
Middleware sharding: A middleware layer sits between the application and the database and handles the sharding logic.
Database-native sharding: The database system itself provides sharding capabilities.

Sharding Challenges

Increased Complexity: Sharding adds complexity to the database infrastructure and application code.
Data Distribution: Choosing the right sharding key and algorithm is crucial for even data distribution and performance.
Joins and Transactions: Performing joins and transactions across shards can be challenging and may require distributed transaction management.
Resharding: Changing the sharding scheme after the database has been sharded can be a complex and time-consuming process.

3.2 KiB Raw Blame History