This PR adds all the guides from [Visual Guides](https://bytebytego.com/guides/) section on bytebytego to the repository with proper links. - [x] Markdown files for guides and categories are placed inside `data/guides` and `data/categories` - [x] Guide links in readme are auto-generated using `scripts/readme.ts`. Everytime you run the script `npm run update-readme`, it reads the categories and guides from the above mentioned folders, generate production links for guides and categories and populate the table of content in the readme. This ensures that any future guides and categories will automatically get added to the readme. - [x] Sorting inside the readme matches the actual category and guides sorting on production
3.2 KiB
title, description, image, createdAt, draft, categories, tags
| title | description | image | createdAt | draft | categories | tags | |||
|---|---|---|---|---|---|---|---|---|---|
| A Crash Course on Database Sharding | Learn database sharding: concepts, techniques, and implementation. | https://assets.bytebytego.com/diagrams/0065-a-crash-course-on-database-sharding.png | 2024-03-04 | false |
|
|
What is Database Sharding?
Database sharding is a type of database partitioning that separates very large databases into smaller, faster, more easily managed parts called data shards. The word shard means a small piece of something bigger.
Sharding is also known as horizontal partitioning. Each shard contains a portion of the data, and all shards together contain all of the data.
Why Sharding?
Sharding is implemented to solve these problems:
-
Too much data on one machine: A single database server can only handle so much data.
-
Too many requests on one machine: A single database server can only handle so many requests.
-
High Latency: As data grows, query latency increases.
How Sharding Works
Sharding involves splitting a database into multiple, independent parts (shards) and distributing them across different servers or machines. Each shard contains a subset of the data, and all shards collectively hold the entire dataset.
Sharding Key
A sharding key is a column in the database table that determines how the data is distributed across the shards. The sharding key is used by the sharding algorithm to determine which shard a particular row of data should be stored in.
Sharding Algorithm
The sharding algorithm is the logic that determines which shard a particular row of data should be stored in. The sharding algorithm uses the sharding key to make this determination.
Here are some common sharding algorithms:
-
Range-based sharding: Data is divided into ranges based on the sharding key. For example, users with IDs from 1 to 1000 might be stored in shard 1, users with IDs from 1001 to 2000 in shard 2, and so on.
-
Hash-based sharding: A hash function is applied to the sharding key to determine the shard. For example,
shard_id = hash(user_id) % num_shards. -
Directory-based sharding: A lookup table (or directory) is used to map sharding keys to shard locations.
Sharding Approaches
-
Application-level sharding: The application is responsible for determining which shard to use for each query.
-
Middleware sharding: A middleware layer sits between the application and the database and handles the sharding logic.
-
Database-native sharding: The database system itself provides sharding capabilities.
Sharding Challenges
-
Increased Complexity: Sharding adds complexity to the database infrastructure and application code.
-
Data Distribution: Choosing the right sharding key and algorithm is crucial for even data distribution and performance.
-
Joins and Transactions: Performing joins and transactions across shards can be challenging and may require distributed transaction management.
-
Resharding: Changing the sharding scheme after the database has been sharded can be a complex and time-consuming process.
