Multi-Version Concurrency Control (MVCC) is PostgreSQL's approach to maintaining data consistency while allowing multiple transactions to work simultaneously. Unlike other databases that use separate undo logs, PostgreSQL keeps multiple versions of the same row right in the table itself. This guide will take you on a journey from basic MVCC concepts to examining the actual bytes on disk.
What we'll cover:
Before diving into MVCC, let's understand the problem it solves. In relational databases, transactions can fail or be rolled back for various reasons - crashes, constraint violations, deadlocks, or explicit rollbacks. When this happens, the database needs a way to undo changes and restore consistency.
Different databases handle this differently:
Every transaction in PostgreSQL gets assigned a unique Transaction ID (txid):
Important: Transaction IDs are circular - after reaching the maximum, they wrap back to 0. This creates interesting visibility rules we'll explore.
Here's the key insight: every row in PostgreSQL contains hidden system columns that track its version history. The two most important are:
These columns, combined with each transaction's snapshot of which other transactions were running, determine what data each transaction can see.
Let's create a simple table to explore these concepts:
CREATE TABLE acct ( id int PRIMARY KEY, owner text NOT NULL, balance int NOT NULL ); INSERT INTO acct VALUES (1, 'alice', 1000);
sql
When you run SELECT * FROM acct;, you see the regular columns. But PostgreSQL is storing much more information with each row.
To see the hidden MVCC data, we need to examine the raw page structure. Let's create a helper function using the pageinspect extension:
sql
CREATE EXTENSION IF NOT EXISTS pageinspect; -- Helper view to examine the internal structure of our acct table CREATE OR REPLACE VIEW acct_heap_p0 AS SELECT h.lp AS lp, -- line pointer (slot) on the page (h.t_ctid)::text AS ctid, -- tuple identifier h.t_xmin AS xmin, -- creating transaction ID h.t_xmax AS xmax, -- deleting/updating transaction ID h.t_infomask AS infomask, -- status flags h.t_infomask2 AS infomask2, -- more status flags -- Decode key status bits: (h.t_infomask & 128 )::boolean AS xmax_is_lock, -- Row is locked, not deleted (h.t_infomask & 1024)::boolean AS xmax_is_committed, -- xmax transaction committed (h.t_infomask & 2048)::boolean AS xmax_invalid, -- xmax is not valid (h.t_infomask & 4096)::boolean AS xmax_multixact, -- xmax refers to multiple transactions -- Show the actual row data: h.t_attrs[1] AS id, h.t_attrs[2] AS owner, h.t_attrs[3] AS balance FROM heap_page_item_attrs(get_raw_page('public.acct', 0), 'public.acct'::regclass) AS h;
sql
The infomask field contains bit flags that tell PostgreSQL how to interpret the xmin and xmax values:
Now let's see MVCC working. We'll open two database sessions and watch what happens during an update.
Session A:
BEGIN; SELECT txid_current(); -- Returns 767
sql
Session B:
BEGIN; SELECT txid_current(); -- Returns 768
sql
UPDATE acct SET balance = balance - 100 WHERE id = 1; -- Transaction 767 is updating the row, but hasn't committed yet
sql
SELECT * FROM acct; -- Still shows balance = 1000
sql
Session B can't see the update because transaction 767 hasn't committed yet. This is MVCC preventing dirty reads.
SELECT * FROM acct_heap_p0;
sql
Before commit, you'll see something like:
lp | xmin | xmax | xmax_is_lock | xmax_is_committed | xmax_invalid | xmax_multixact | balance ---+------+------+--------------+-------------------+--------------+----------------+--------- 3 | 765 | 767 | f | f | f | f | 1000 5 | 767 | 0 | f | f | t | f | 900
sql
What this tells us:
- In Session A: COMMIT; - Now check the page again: SELECT * FROM acct_heap_p0;
sql
Initially, you might not see the flags change! This is because hint bits are set lazily - only when another process accesses the tuple or vacuum runs.
sql
- Trigger hint bit update by selecting: SELECT * FROM acct; - Now check again: SELECT * FROM acct_heap_p0;
sql
After commit and access:
lp | xmin | xmax | xmax_is_lock | xmax_is_committed | xmax_invalid | balance ---+------+------+--------------+-------------------+--------------+--------- 3 | 765 | 767 | f | t | f | 1000 5 | 767 | 0 | f | f | t | 900
sql
Now xmax_is_committed = true for row 3, indicating that transaction 767 successfully committed and this row version is obsolete.
When your transaction (let's call it T) runs a SELECT, PostgreSQL checks each row version using this logic:
This two-step process ensures each transaction sees a consistent snapshot of the database.
Let's examine what happens over time without vacuum:
sql
- After several updates, check the page: SELECT * FROM acct_heap_p0;
sql
You might see something like:
lp | xmin | xmax | xmax_is_committed | balance | visible_to_current_txns ---+------+------+-------------------+---------+------------------------ 2 | 765 | 767 | t | 1000 | No (obsolete) 3 | 767 | 770 | t | 900 | No (obsolete) 4 | 770 | 773 | t | 800 | No (obsolete) 5 | 773 | 0 | f | 700 | Yes (current)
sql
The Problem: Dead tuples (rows 2, 3, 4) accumulate on disk. They serve no purpose - no current transaction can see them, but they're taking up space and slowing down scans.
The Solution: The vacuum process identifies these dead tuples and reclaims their space.
PostgreSQL runs auto vacuum in the background by default. It has configurable thresholds that trigger vacuum when enough dead tuples accumulate. For tables with frequent updates, you might want to tune these thresholds:
- Example: More aggressive vacuum for a frequently updated table ALTER TABLE acct SET (autovacuum_vacuum_threshold = 50); ALTER TABLE acct SET (autovacuum_vacuum_scale_factor = 0.1);
sql
Understanding these internals helps explain PostgreSQL's behavior and guides tuning decisions for high-update workloads.