Pranay Netha Guda | Full Stack Developer

Understanding the hidden costs of traditional database structures on modern storage

The Problem in One Sentence

B-trees work fine on SSDs initially, but their in-place updates cause high write amplification and hot-spot wear over time, resulting in slower performance and shorter SSD lifespan compared to flash-optimized alternatives.

Background: Why B-trees Exist

Let me start with some context. B-trees are everywhere today:

Database engines (MySQL, PostgreSQL)
Filesystems (NTFS, ext4)
Storage systems across the industry

They were designed decades ago for spinning disks, where:

Random seeks were incredibly expensive
The goal was to minimize disk head movement
Solution: Keep trees shallow, pack many keys per node

This design made perfect sense for mechanical storage.

The SSD Reality Check

When SSDs first arrived, B-trees seemed to work great:

Reads are lightning fast
Random access is practically free
Initial writes perform well too

But here's the catch: Over time, B-trees' update patterns clash fundamentally with how flash memory actually works.

The result isn't just slower performance—it's measurable reduction in your SSD's lifespan.

The Write Amplification Problem

Let me show you what happens with a simple example:

The Setup

Page size: 4 KB
Erase block size: 256 KB (64 pages per block)
Operation: Insert one record into a B-tree leaf pagesql

What the B-tree "Thinks" Happens

Find page #17
Update it in place
Done!

What Actually Happens on the SSD

Read the entire 256 KB erase block containing page #17
Modify just 4 KB in memory
Erase the whole 256 KB block
Write the entire 256 KB back

Result: 64× more physical work than the logical write

This is called write amplification, and it's just the beginning.

The Hot Spot Problem

B-trees have a natural hierarchy problem:

Root and upper nodes → Updated frequently (every insert/delete affects them) Leaf nodes → Updated occasionally (only when that specific data changes)

What This Means for Your SSD

The same erase blocks get rewritten constantly
Flash memory cells wear out after 3,000–10,000 program/erase cycles
Hot spots wear out much faster than the rest of the drive
SSD controller remaps failed blocks to spares
But spare blocks are finite

Bottom line: Uneven wear patterns in B-trees directly accelerate SSD degradation.

The Garbage Collection Tax

As your SSD fills up, another problem emerges: garbage collection.

How Garbage Collection Works

SSD controller needs to clean partially-used blocks
Moves valid pages out of a block
Erases the block
Writes data back elsewhere

Why B-trees Make This Worse

B-tree updates scatter writes across many different pages
Scattered writes mean garbage collection has to move more data around
A single 4 KB update can cascade into multiple block rewrites during GC
Write amplification compounds exponentially

The Downward Spiral

These problems don't exist in isolation—they feed on each other:

Frequent B-tree updates
    ↓
Same blocks hit repeatedly (hot spots)
    ↓
Controller remaps worn blocks, spare pool shrinks
    ↓
More garbage collection overhead needed
    ↓
Higher write amplification across the board
    ↓
More erases per logical write
    ↓
Performance degrades + Lifespan shortenssql

This creates a feedback loop where the problems accelerate over time.

Real-World Impact

Performance Degradation

Garbage collection competes with real application writes
Response times become unpredictable
Throughput drops significantly under sustained load

Lifespan Reduction

Flash cells wear out sooner due to excessive erase cycles
Drive may fail years before its expected lifespan
Premature replacement costs

The Hidden Costs

Increased infrastructure replacement budgets
Potential data availability issues
Performance troubleshooting overhead

The Modern Solution

This is why the industry has largely moved away from B-trees for write-heavy workloads on SSDs:

LSM Trees (Log-Structured Merge Trees)

Used by: RocksDB, Cassandra, LevelDB
How they work: Sequential writes, periodic compaction
Trade-off: Slightly more complex reads for much better write patterns

Copy-on-Write B-trees

Used by: Btrfs, APFS, ZFS
How they work: Never modify data in place, always write to new locations
Trade-off: More metadata overhead for better flash compatibility

Both approaches sacrifice some read efficiency for dramatically better write behavior and longer SSD life.

Key Takeaways

B-trees aren't inherently "bad" for SSDs—they work fine initially, especially for read-heavy workloads
The problems emerge over time under sustained write workloads due to fundamental mismatches with flash memory
Write amplification is the root cause—what looks like a 4 KB write becomes 256 KB+ of physical work
Modern alternatives exist that are specifically designed for flash storage characteristics
The choice of data structure can significantly impact both performance and hardware longevity

Why B-trees Can Shorten SSD Lifespan