Pranay Netha Guda | Full Stack Developer

Introduction

Multi-Version Concurrency Control (MVCC) is PostgreSQL's approach to maintaining data consistency while allowing multiple transactions to work simultaneously. Unlike other databases that use separate undo logs, PostgreSQL keeps multiple versions of the same row right in the table itself. This guide will take you on a journey from basic MVCC concepts to examining the actual bytes on disk.

What we'll cover:

Understanding transaction IDs and why they matter
How PostgreSQL stores version information in each row
Building tools to inspect this hidden data
Walking through a complete update example
Understanding when and why vacuum is needed

Understanding Undo Management

Before diving into MVCC, let's understand the problem it solves. In relational databases, transactions can fail or be rolled back for various reasons - crashes, constraint violations, deadlocks, or explicit rollbacks. When this happens, the database needs a way to undo changes and restore consistency.

Different databases handle this differently:

Oracle and MySQL: Use separate storage areas for undo information
PostgreSQL: Maintains multiple versions of data within the same table, using transaction IDs to track which version belongs to which transaction

Transaction IDs: The Foundation of MVCC

Every transaction in PostgreSQL gets assigned a unique Transaction ID (txid):

Range: 32-bit unsigned integers (0 to 2^32 - 1)
Practical limit: ~2.1 billion usable transaction IDs
Special reserved IDs:

Important: Transaction IDs are circular - after reaching the maximum, they wrap back to 0. This creates interesting visibility rules we'll explore.

System Columns: Where MVCC Magic Happens

Here's the key insight: every row in PostgreSQL contains hidden system columns that track its version history. The two most important are:

xmin: The transaction ID that created this row version
xmax: The transaction ID that deleted or updated this row version

These columns, combined with each transaction's snapshot of which other transactions were running, determine what data each transaction can see.

Setting Up Our Example

Let's create a simple table to explore these concepts:

CREATE TABLE acct (
  id int PRIMARY KEY,
  owner text NOT NULL,
  balance int NOT NULL
);

INSERT INTO acct VALUES (1, 'alice', 1000);sql

When you run SELECT * FROM acct;, you see the regular columns. But PostgreSQL is storing much more information with each row.

Building Our Analysis Tool

To see the hidden MVCC data, we need to examine the raw page structure. Let's create a helper function using the pageinspect extension:

sql

CREATE EXTENSION IF NOT EXISTS pageinspect;

-- Helper view to examine the internal structure of our acct table
CREATE OR REPLACE VIEW acct_heap_p0 AS
SELECT
  h.lp                                    AS lp,          -- line pointer (slot) on the page
  (h.t_ctid)::text                        AS ctid,        -- tuple identifier
  h.t_xmin                                AS xmin,        -- creating transaction ID
  h.t_xmax                                AS xmax,        -- deleting/updating transaction ID
  h.t_infomask                            AS infomask,    -- status flags
  h.t_infomask2                           AS infomask2,   -- more status flags
  
  -- Decode key status bits:
  (h.t_infomask & 128 )::boolean          AS xmax_is_lock,        -- Row is locked, not deleted
  (h.t_infomask & 1024)::boolean          AS xmax_is_committed,   -- xmax transaction committed
  (h.t_infomask & 2048)::boolean          AS xmax_invalid,        -- xmax is not valid
  (h.t_infomask & 4096)::boolean          AS xmax_multixact,      -- xmax refers to multiple transactions
  
  -- Show the actual row data:
  h.t_attrs[1]                            AS id,
  h.t_attrs[2]                            AS owner,
  h.t_attrs[3]                            AS balance
FROM heap_page_item_attrs(get_raw_page('public.acct', 0), 'public.acct'::regclass) AS h;sql

Understanding infomask Flags

The infomask field contains bit flags that tell PostgreSQL how to interpret the xmin and xmax values:

xmax_is_lock (HEAP_XMAX_LOCK_ONLY): The xmax represents a row lock (like SELECT ... FOR UPDATE), not a delete or update. The row is still alive.
xmax_is_committed (HEAP_XMAX_COMMITTED): The transaction in xmax has committed. If it was a delete/update, this row version is now obsolete.
xmax_invalid (HEAP_XMAX_INVALID): The xmax value should be ignored - either no transaction ever modified this row, or the modifying transaction was aborted.
xmax_multixact (HEAP_XMAX_IS_MULTI): The xmax refers to a MultiXact ID representing multiple concurrent lock holders.

MVCC in Action: The Update Example

Now let's see MVCC working. We'll open two database sessions and watch what happens during an update.

Step 1: Start Two Transactions

Session A:

BEGIN;
SELECT txid_current(); -- Returns 767sql

Session B:

BEGIN;
SELECT txid_current(); -- Returns 768sql

Step 2: Update the Record (Session A)

UPDATE acct SET balance = balance - 100 WHERE id = 1;
-- Transaction 767 is updating the row, but hasn't committed yetsql

Step 3: Check Visibility (Session B)

SELECT * FROM acct; -- Still shows balance = 1000sql

Session B can't see the update because transaction 767 hasn't committed yet. This is MVCC preventing dirty reads.

Step 4: Examine the Page Structure

SELECT * FROM acct_heap_p0;sql

Before commit, you'll see something like:

lp | xmin | xmax | xmax_is_lock | xmax_is_committed | xmax_invalid | xmax_multixact | balance
---+------+------+--------------+-------------------+--------------+----------------+---------
3  | 765  | 767  | f            | f                 | f            | f              | 1000
5  | 767  | 0    | f            | f                 | t            | f              | 900sql

What this tells us:

Row 3: Original row created by transaction 765, being updated by transaction 767
Row 5: New row version created by transaction 767

Step 5: Commit and Re-examine

- In Session A:
COMMIT;
- Now check the page again:
SELECT * FROM acct_heap_p0;sql

Initially, you might not see the flags change! This is because hint bits are set lazily - only when another process accesses the tuple or vacuum runs.

sql

- Trigger hint bit update by selecting:
SELECT * FROM acct;
- Now check again:
SELECT * FROM acct_heap_p0;sql

After commit and access:

lp | xmin | xmax | xmax_is_lock | xmax_is_committed | xmax_invalid | balance
---+------+------+--------------+-------------------+--------------+---------
3  | 765  | 767  | f            | t                 | f            | 1000
5  | 767  | 0    | f            | f                 | t            | 900sql

Now xmax_is_committed = true for row 3, indicating that transaction 767 successfully committed and this row version is obsolete.

How MVCC Determines Visibility

When your transaction (let's call it T) runs a SELECT, PostgreSQL checks each row version using this logic:

Step 1: Check xmin (Row Creator)

If xmin < T and the creator transaction committed before your snapshot → row might be visible
If xmin = T (your own transaction) → you always see your own changes
If xmin > T → row was created "in the future" from your perspective
If the creator transaction aborted → row is invisible regardless

Step 2: Check xmax (Row Killer)

If xmax = 0 → nobody killed this row → still alive
If xmax < T and the killer transaction committed before your snapshot → row was deleted → invisible
If xmax = T → you deleted/updated it yourself → you don't see the old version
If xmax > T → deletion happened "in the future" → row is still visible to you
If the killer transaction aborted → row is visible (deletion didn't happen)

This two-step process ensures each transaction sees a consistent snapshot of the database.

The Need for Vacuum

Let's examine what happens over time without vacuum:

sql

- After several updates, check the page:
SELECT * FROM acct_heap_p0;sql

You might see something like:

lp | xmin | xmax | xmax_is_committed | balance | visible_to_current_txns
---+------+------+-------------------+---------+------------------------
2  | 765  | 767  | t                 | 1000    | No (obsolete)
3  | 767  | 770  | t                 | 900     | No (obsolete) 
4  | 770  | 773  | t                 | 800     | No (obsolete)
5  | 773  | 0    | f                 | 700     | Yes (current)sql

The Problem: Dead tuples (rows 2, 3, 4) accumulate on disk. They serve no purpose - no current transaction can see them, but they're taking up space and slowing down scans.

The Solution: The vacuum process identifies these dead tuples and reclaims their space.

Auto Vacuum

PostgreSQL runs auto vacuum in the background by default. It has configurable thresholds that trigger vacuum when enough dead tuples accumulate. For tables with frequent updates, you might want to tune these thresholds:

- Example: More aggressive vacuum for a frequently updated table
ALTER TABLE acct SET (autovacuum_vacuum_threshold = 50);
ALTER TABLE acct SET (autovacuum_vacuum_scale_factor = 0.1);sql

Key Takeaways

MVCC enables concurrency: Readers and writers don't block each other because each transaction sees its own consistent snapshot.
Every row carries its history: xmin and xmax track which transactions created and destroyed each row version.
Visibility is determined by transaction snapshots: Your transaction ID plus knowledge of which transactions were running determines what you can see.
Dead tuples accumulate: MVCC creates obsolete row versions that need periodic cleanup.
Vacuum is essential: Without it, tables would grow indefinitely with dead tuples, degrading performance.
Hint bits optimize performance: PostgreSQL caches transaction commit/abort status in tuple headers to avoid repeatedly checking the commit log.

Understanding these internals helps explain PostgreSQL's behavior and guides tuning decisions for high-update workloads.

MVCC, Autovaccume, and Bloat in PostgreSQL