--- title: "Vertical vs Horizontal Partitioning" description: "Explore vertical vs horizontal partitioning strategies in databases." image: "https://assets.bytebytego.com/diagrams/0402-vertical-partitioning-vs-horizontal-partitioning.png" createdAt: "2024-01-30" draft: false categories: - cloud-distributed-systems tags: - "Database Partitioning" - "Sharding" --- ![](https://assets.bytebytego.com/diagrams/0402-vertical-partitioning-vs-horizontal-partitioning.png) In many large-scale applications, data is divided into partitions that can be accessed separately. There are two typical strategies for partitioning data. * Vertical partitioning: it means some columns are moved to new tables. Each table contains the same number of rows but fewer columns (see diagram below). * Horizontal partitioning (often called sharding): it divides a table into multiple smaller tables. Each table is a separate data store, and it contains the same number of columns, but fewer rows (see diagram below). Horizontal partitioning is widely used so let’s take a closer look. ## Routing algorithm The routing algorithm decides which partition (shard) stores the data. * Range-based sharding. This algorithm uses ordered columns, such as integers, longs, timestamps, to separate the rows. For example, the diagram below uses the User ID column for range partition: User IDs 1 and 2 are in shard 1, User IDs 3 and 4 are in shard 2. * Hash-based sharding. This algorithm applies a hash function to one column or several columns to decide which row goes to which table. For example, the diagram below uses **User ID mod 2** as a hash function. User IDs 1 and 3 are in shard 1, User IDs 2 and 4 are in shard 2. ## Benefits * Facilitate horizontal scaling. Sharding facilitates the possibility of adding more machines to spread out the load. * Shorten response time. By sharding one table into multiple tables, queries go over fewer rows, and results are returned much more quickly. ## Drawbacks * The order by operation is more complicated. Usually, we need to fetch data from different shards and sort the data in the application's code. * Uneven distribution. Some shards may contain more data than others (this is also called the hotspot).