Introduction
Multi-Version Concurrency Control (MVCC) is PostgreSQL's approach to maintaining data consistency while allowing multiple transactions to work simultaneously. Unlike other databases that use separate undo logs, PostgreSQL keeps multiple versions of the same row right in the table itself. This guide will take you on a journey from basic MVCC concepts to examining the actual bytes on disk.
What we'll cover:
- Understanding transaction IDs and why they matter
- How PostgreSQL stores version information in each row
- Building tools to inspect this hidden data
- Walking through a complete update example
- Understanding when and why vacuum is needed
Understanding Undo Management
Before diving into MVCC, let's understand the problem it solves. In relational databases, transactions can fail or be rolled back for various reasons - crashes, constraint violations, deadlocks, or explicit rollbacks. When this happens, the database needs a way to undo changes and restore consistency.
Different databases handle this differently:
- Oracle and MySQL: Use separate storage areas for undo information
- PostgreSQL: Maintains multiple versions of data within the same table, using transaction IDs to track which version belongs to which transaction
Transaction IDs: The Foundation of MVCC
Every transaction in PostgreSQL gets assigned a unique Transaction ID (txid):
- Range: 32-bit unsigned integers (0 to 2^32 - 1)
- Practical limit: ~2.1 billion usable transaction IDs
- Special reserved IDs:
Important: Transaction IDs are circular - after reaching the maximum, they wrap back to 0. This creates interesting visibility rules we'll explore.
System Columns: Where MVCC Magic Happens
Here's the key insight: every row in PostgreSQL contains hidden system columns that track its version history. The two most important are:
- xmin: The transaction ID that created this row version
- xmax: The transaction ID that deleted or updated this row version
These columns, combined with each transaction's snapshot of which other transactions were running, determine what data each transaction can see.
Setting Up Our Example
Let's create a simple table to explore these concepts:
CREATE TABLE acct ( id int PRIMARY KEY, owner text NOT NULL, balance int NOT NULL ); INSERT INTO acct VALUES (1, 'alice', 1000);sql
When you run SELECT * FROM acct;, you see the regular columns. But PostgreSQL is storing much more information with each row.
Building Our Analysis Tool
To see the hidden MVCC data, we need to examine the raw page structure. Let's create a helper function using the pageinspect extension:
sql
CREATE EXTENSION IF NOT EXISTS pageinspect; -- Helper view to examine the internal structure of our acct table CREATE OR REPLACE VIEW acct_heap_p0 AS SELECT h.lp AS lp, -- line pointer (slot) on the page (h.t_ctid)::text AS ctid, -- tuple identifier h.t_xmin AS xmin, -- creating transaction ID h.t_xmax AS xmax, -- deleting/updating transaction ID h.t_infomask AS infomask, -- status flags h.t_infomask2 AS infomask2, -- more status flags -- Decode key status bits: (h.t_infomask & 128 )::boolean AS xmax_is_lock, -- Row is locked, not deleted (h.t_infomask & 1024)::boolean AS xmax_is_committed, -- xmax transaction committed (h.t_infomask & 2048)::boolean AS xmax_invalid, -- xmax is not valid (h.t_infomask & 4096)::boolean AS xmax_multixact, -- xmax refers to multiple transactions -- Show the actual row data: h.t_attrs[1] AS id, h.t_attrs[2] AS owner, h.t_attrs[3] AS balance FROM heap_page_item_attrs(get_raw_page('public.acct', 0), 'public.acct'::regclass) AS h;sql
Understanding infomask Flags
The infomask field contains bit flags that tell PostgreSQL how to interpret the xmin and xmax values:
- xmax_is_lock (HEAP_XMAX_LOCK_ONLY): The xmax represents a row lock (like SELECT ... FOR UPDATE), not a delete or update. The row is still alive.
- xmax_is_committed (HEAP_XMAX_COMMITTED): The transaction in xmax has committed. If it was a delete/update, this row version is now obsolete.
- xmax_invalid (HEAP_XMAX_INVALID): The xmax value should be ignored - either no transaction ever modified this row, or the modifying transaction was aborted.
- xmax_multixact (HEAP_XMAX_IS_MULTI): The xmax refers to a MultiXact ID representing multiple concurrent lock holders.
MVCC in Action: The Update Example
Now let's see MVCC working. We'll open two database sessions and watch what happens during an update.
Step 1: Start Two Transactions
Session A:
BEGIN; SELECT txid_current(); -- Returns 767sql
Session B:
BEGIN; SELECT txid_current(); -- Returns 768sql
Step 2: Update the Record (Session A)
UPDATE acct SET balance = balance - 100 WHERE id = 1; -- Transaction 767 is updating the row, but hasn't committed yetsql
Step 3: Check Visibility (Session B)
SELECT * FROM acct; -- Still shows balance = 1000sql
Session B can't see the update because transaction 767 hasn't committed yet. This is MVCC preventing dirty reads.
Step 4: Examine the Page Structure
SELECT * FROM acct_heap_p0;sql
Before commit, you'll see something like:
lp | xmin | xmax | xmax_is_lock | xmax_is_committed | xmax_invalid | xmax_multixact | balance ---+------+------+--------------+-------------------+--------------+----------------+--------- 3 | 765 | 767 | f | f | f | f | 1000 5 | 767 | 0 | f | f | t | f | 900sql
What this tells us:
- Row 3: Original row created by transaction 765, being updated by transaction 767
- Row 5: New row version created by transaction 767
Step 5: Commit and Re-examine
- In Session A: COMMIT; - Now check the page again: SELECT * FROM acct_heap_p0;sql
Initially, you might not see the flags change! This is because hint bits are set lazily - only when another process accesses the tuple or vacuum runs.
sql
- Trigger hint bit update by selecting: SELECT * FROM acct; - Now check again: SELECT * FROM acct_heap_p0;sql
After commit and access:
lp | xmin | xmax | xmax_is_lock | xmax_is_committed | xmax_invalid | balance ---+------+------+--------------+-------------------+--------------+--------- 3 | 765 | 767 | f | t | f | 1000 5 | 767 | 0 | f | f | t | 900sql
Now xmax_is_committed = true for row 3, indicating that transaction 767 successfully committed and this row version is obsolete.
How MVCC Determines Visibility
When your transaction (let's call it T) runs a SELECT, PostgreSQL checks each row version using this logic:
Step 1: Check xmin (Row Creator)
- If xmin < T and the creator transaction committed before your snapshot → row might be visible
- If xmin = T (your own transaction) → you always see your own changes
- If xmin > T → row was created "in the future" from your perspective
- If the creator transaction aborted → row is invisible regardless
Step 2: Check xmax (Row Killer)
- If xmax = 0 → nobody killed this row → still alive
- If xmax < T and the killer transaction committed before your snapshot → row was deleted → invisible
- If xmax = T → you deleted/updated it yourself → you don't see the old version
- If xmax > T → deletion happened "in the future" → row is still visible to you
- If the killer transaction aborted → row is visible (deletion didn't happen)
This two-step process ensures each transaction sees a consistent snapshot of the database.
The Need for Vacuum
Let's examine what happens over time without vacuum:
sql
- After several updates, check the page: SELECT * FROM acct_heap_p0;sql
You might see something like:
lp | xmin | xmax | xmax_is_committed | balance | visible_to_current_txns ---+------+------+-------------------+---------+------------------------ 2 | 765 | 767 | t | 1000 | No (obsolete) 3 | 767 | 770 | t | 900 | No (obsolete) 4 | 770 | 773 | t | 800 | No (obsolete) 5 | 773 | 0 | f | 700 | Yes (current)sql
The Problem: Dead tuples (rows 2, 3, 4) accumulate on disk. They serve no purpose - no current transaction can see them, but they're taking up space and slowing down scans.
The Solution: The vacuum process identifies these dead tuples and reclaims their space.
Auto Vacuum
PostgreSQL runs auto vacuum in the background by default. It has configurable thresholds that trigger vacuum when enough dead tuples accumulate. For tables with frequent updates, you might want to tune these thresholds:
- Example: More aggressive vacuum for a frequently updated table ALTER TABLE acct SET (autovacuum_vacuum_threshold = 50); ALTER TABLE acct SET (autovacuum_vacuum_scale_factor = 0.1);sql
Key Takeaways
- MVCC enables concurrency: Readers and writers don't block each other because each transaction sees its own consistent snapshot.
- Every row carries its history: xmin and xmax track which transactions created and destroyed each row version.
- Visibility is determined by transaction snapshots: Your transaction ID plus knowledge of which transactions were running determines what you can see.
- Dead tuples accumulate: MVCC creates obsolete row versions that need periodic cleanup.
- Vacuum is essential: Without it, tables would grow indefinitely with dead tuples, degrading performance.
- Hint bits optimize performance: PostgreSQL caches transaction commit/abort status in tuple headers to avoid repeatedly checking the commit log.
Understanding these internals helps explain PostgreSQL's behavior and guides tuning decisions for high-update workloads.