Why B-trees Can Shorten SSD Lifespan
8/26/2025
Why B-trees Can Shorten SSD Lifespan
B-trees are everywhere — in databases, filesystems, and countless storage engines.
They were designed decades ago for spinning disks, where the cost of a random seek dominated everything.
By keeping trees shallow and packing many keys into each node, they kept lookups efficient.
On SSDs, things start off fine. Reads are fast, random access is cheap, and at first, writes don’t seem like a problem either.
But over time, the way B-trees handle updates begins to clash with how flash memory actually works.
The result isn’t just slower performance — it’s a measurable reduction in SSD lifespan.
A Simple Example
Setup:
- Page size: 4 KB - Erase block size: 256 KB (64 pages per block) - Operation: Insert a record into a B-tree leaf page
From the B-tree’s perspective, this is easy:
Find the right page (say page #17) and update it in place.
But on an SSD, there’s no such thing as an in-place update. Instead:
- Read the entire 256 KB erase block containing page #17
- Modify 4 KB in memory
- Erase the whole 256 KB block
- Write the 256 KB back
That’s 64× more physical work than the logical write.
Hot Spot Problem
B-trees have a natural imbalance: the root and upper-level nodes are updated frequently, while most leaves are touched only occasionally.
On an SSD, this means the same blocks get rewritten again and again.
Flash memory wears out after a limited number of program/erase cycles (often 3,000–10,000 for consumer drives).
If certain blocks are hit disproportionately, they’ll wear out far earlier than the rest of the drive.
The SSD controller can remap them to spare blocks, but those spares are finite.
Takeaway: Hot spots in B-trees accelerate localized wear, cutting into SSD endurance.
The Garbage Collection Tax
As the drive fills up, the SSD controller must clean partially used blocks.
This process — garbage collection (GC) — involves moving valid pages out of a block, erasing it, then writing data back.
With B-trees, updates scatter writes across many pages.
That scattering means garbage collection has to move a lot more data around just to free space.
A single 4 KB update can cascade into multiple block rewrites during GC.
Takeaway: Random, scattered updates from B-trees amplify the work garbage collection must do, pushing write amplification even higher.
The Downward Spiral
Put together, the issues look like this:
- Frequent updates → same nodes, same blocks, same cells
- Hot spot wear → controller remaps blocks, spare pool shrinks
- More GC overhead → higher write amplification
- Higher amplification → more erases per logical write
This feedback loop means that over time:
- Performance slows down (GC competes with real writes)
- Lifespan shrinks (cells wear out sooner)
Conclusion
B-trees aren’t “bad” for SSDs from the start.
They perform decently at first, especially for read-heavy workloads.
But under sustained updates, their in-place modification pattern directly collides with the erase-block nature of flash.
The result is higher write amplification, uneven wear, and reduced endurance.
This is why many modern storage systems on SSDs turn to alternatives like:
- LSM trees (RocksDB, Cassandra)
- Copy-on-write B-trees (Btrfs, APFS)
These structures trade off some read efficiency for far better write behavior on flash — and, most importantly, a longer SSD life.