Database Crash Recovery: ARIES, WAL & Shadow Paging (2026)

PerfectNotes TeamUpdated June 2026

Key Takeaways

WAL is the Foundation — The Write-Ahead Log mandates that every change must be recorded in a durable log before the data page is modified. This single rule guarantees recoverability from any crash.
REDO Saves Committed Work — After a crash, RAM (and its dirty pages) is wiped. REDO replays every committed operation from the log to rebuild data that existed only in RAM at crash time.
UNDO Removes Incomplete Work — UNDO reverses every operation from transactions that were active (uncommitted) at the moment of the crash — enforcing ACID Atomicity.
ARIES = Analysis + REDO + UNDO — IBM's ARIES algorithm (1992) adds mathematical precision: LSN-based idempotence ensures recovery is safe even if the server crashes during recovery itself.
Shadow Paging is Obsolete — Shadow paging avoids logs but causes severe data fragmentation and cannot handle concurrent transactions at scale. WAL/ARIES dominates 99% of production RDBMS.

What Is Database Crash Recovery?

Every production database operates in a state of constant risk. At any moment, a power failure, hardware fault, OS panic, or network partition can kill the database server mid-transaction — with thousands of operations in flight, data partially written to RAM, and the physical disk pages in an inconsistent state. Without a recovery system, the database would be permanently corrupted on every crash.

Database Crash Recovery is the automated process the DBMS executes after a system failure to restore the database to the last consistent, ACID-compliant state. It guarantees two core ACID properties:

Atomicity — incomplete (uncommitted) transactions are completely reversed, as if they never occurred
Durability — every committed transaction's effects are permanently preserved, even if the crash happened the millisecond after the commit signal was sent

Split-screen: left shows an accountant rapidly writing in a notebook labeled WAL before touching money; right shows a heavy slow vault door labeled Database Disk representing the slow persistent storage — Figure 1: The WAL Analogy — the log (notebook) is always written first. The heavy vault (disk database file) is updated later. If power fails, the notebook survives and guides recovery.

Log-Based Recovery: WAL, REDO & UNDO

The dominant recovery mechanism in all major RDBMS is log-based recovery. It works by maintaining a sequential, append-only Write-Ahead Log (WAL) — a durable, ordered journal of every database operation. The WAL has one inviolable rule:

The WAL Rule (Write-Ahead Logging Protocol)

“A log record for a data modification must be written to durable storage before the modified data page is written to the database file.”

Log Record Structure

Every write to the WAL produces a log record containing:

-- Structure of a WAL log record

<LSN, TransactionID, PageID, Offset, OldValue, NewValue, PrevLSN>

LSN : Log Sequence Number — monotonically increasing unique ID for this record

TransactionID : Which transaction made this change

PageID : Which disk page was modified

OldValue : The before-image (used for UNDO)

NewValue : The after-image (used for REDO)

PrevLSN : LSN of the previous log record for this same transaction (forms a backward chain)

Normal Operation: The Buffer Pool Model

Modern databases do not write every committed change directly to disk — that would be catastrophically slow. Instead they maintain a Buffer Pool (an in-memory cache of disk pages) and defer physical writes:

Transaction modifies data → The change is applied to the in-memory buffer page, marking it as a dirty page.
WAL record written first → Before the dirty page can ever be flushed to disk, its log record must be durably written to the WAL file on disk.
Commit logged → A <COMMIT, TxID> record is written to the WAL and flushed. The user receives a success response.
Dirty pages linger in RAM → The actual database data file pages are not yet updated — they remain dirty in the buffer pool for efficiency.
Checkpoint flush → Periodically, the database flushes dirty pages to disk and writes a <CHECKPOINT> record to the log marking a safe recovery boundary.

Figure 2: The Checkpoint Recovery Timeline — recovery only needs to scan the log from the last checkpoint. T1 (committed after checkpoint) needs REDO; T2 (incomplete at crash) needs UNDO.

REDO: Replaying Committed Work

After a crash, RAM is wiped — all dirty buffer pool pages (including those holding committed data that was never flushed) are gone. The REDO phase replays the WAL forward from the last checkpoint, reapplying every change from committed transactions to rebuild the database to its exact pre-crash state.

Scenario	At Crash	REDO Action	Result
T1 committed, page not flushed	COMMIT record in log; data only in RAM	Replay the UPDATE to the disk page	Data saved ✅
T2 committed, page already flushed	COMMIT in log; disk page already has new data	PageLSN ≥ LSN_record → skip (idempotent)	No double-write ✅
T3 uncommitted, page partially flushed	No COMMIT in log; some dirty pages on disk	REDO applies changes — UNDO will reverse them	Temporary — UNDO follows ⚠

UNDO: Reversing Incomplete Transactions

The UNDO phase scans the WAL backward, identifying all loser transactions — those that were active (had no COMMIT record) at the time of the crash. For each operation in a loser transaction, the UNDO phase applies the before-image (the OldValue from the log record) to reverse the partial change, restoring the data to its pre-transaction state.

-- UNDO log record written during UNDO phase (Compensation Log Record - CLR)

<LSN=1050, TxID=T3, PageID=42, UndoNextLSN=970, Type=CLR>

CLR (Compensation Log Record): written to the WAL for every UNDO operation performed

UndoNextLSN: points to the next record to UNDO for this transaction (backward chain)

CLRs are never themselves undone — they mark permanent, irreversible recovery actions

Checkpoints: Bounding Recovery Cost

Without checkpoints, every crash would require scanning the entire WAL from the very beginning of time — potentially terabytes of log data. A checkpoint is a bookmarked synchronization point that tells the recovery engine: “everything before this point is safely on disk — only scan from here.”

Checkpoint Type	Mechanism	Impact on Transactions	Used By
Sharp Checkpoint	Halts all transactions, flushes ALL dirty pages to disk, writes checkpoint record	Severe stall — all queries paused during flush	Legacy systems, simple embedded DBs
Fuzzy Checkpoint	Begins checkpoint, continues accepting transactions, flushes dirty pages gradually in the background	No stall — transactions continue during checkpoint	PostgreSQL, MySQL InnoDB, Oracle, SQL Server
ARIES Checkpoint	Writes Transaction Table + Dirty Page Table to log as checkpoint content — no page flush required	Minimal overhead — only metadata written to log	IBM DB2, academic DBMS implementations

What a Checkpoint Record Contains (ARIES Style)

-- ARIES checkpoint record written to WAL

<CHECKPOINT_BEGIN, LSN=900>

<CHECKPOINT_END,

Transaction Table: [

{ TxID: T1, Status: Running, LastLSN: 870 },

{ TxID: T2, Status: Committed, LastLSN: 850 }

Dirty Page Table: [

{ PageID: 42, RecLSN: 820 },

{ PageID: 77, RecLSN: 865 }

]

RecLSN (Recovery LSN): the earliest LSN that might have dirtied this page — recovery scans from the minimum RecLSN

The ARIES Algorithm: 3-Phase Crash Recovery

Invented by Mohan, Haderle, Lindsay, Pirahesh, and Schwarz at IBM Research in 1992, ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the gold standard for crash recovery in systems with concurrent transactions. It extends basic REDO/UNDO with LSN-based tracking to achieve mathematical idempotence — recovery can safely restart mid-way without corrupting data.

ARIES operates on three precisely sequenced phases upon database restart:

Figure 3: ARIES 3-Phase Recovery — Analysis scans forward to rebuild state; REDO scans forward to replay history; UNDO scans backward to reverse loser transactions. This strict sequence guarantees correctness.

Phase 1: Analysis — Reconstructing State

The Analysis phase scans the WAL forward from the last checkpoint record to the end of the log (the crash point). Its goal is to reconstruct two critical data structures:

Transaction Table — all transactions that were active at the crash, plus their last LSN and commit status
Dirty Page Table — all buffer pool pages that were modified but not yet flushed to disk at the crash, plus their RecLSN (earliest log record that dirtied them)

-- Analysis phase pseudo-logic

FOR EACH log record from CHECKPOINT to END_OF_LOG:

IF record.type == UPDATE:

add/update TxID in Transaction_Table (Status = Running, LastLSN = record.LSN)

add PageID to Dirty_Page_Table if not already present (RecLSN = record.LSN)

IF record.type == COMMIT:

update Transaction_Table: Status = Committed

IF record.type == ABORT:

mark as Loser (needs UNDO)

Result: Transaction_Table contains winners (COMMIT) and losers (no COMMIT) at crash time

Phase 2: REDO — Repeating History

The REDO phase scans the WAL forward, starting from the minimum RecLSN across all entries in the Dirty Page Table (the earliest point any unwritten change originated). It reapplies every single operation — including those from loser transactions — to bring the database to its exact physical state at the crash moment.

This is the critical ARIES insight: REDO repeats history completely before UNDO begins. This unconditional approach is what makes ARIES correct with concurrent transactions.

-- REDO idempotency check using LSN (the core ARIES innovation)

FOR EACH update record from min(RecLSN) to END_OF_LOG:

fetch PageID from disk

IF page.PageLSN >= record.LSN:

SKIP — this update is already on disk (idempotent check)

ELSE:

apply record.NewValue to page

set page.PageLSN = record.LSN

mark page as dirty in buffer pool

PageLSN ≥ LSN_record → skip: this makes ARIES idempotent (safe to crash mid-recovery)

Phase 3: UNDO — Rolling Back Losers

The UNDO phase scans the WAL backward, processing only the loser transactions identified in Phase 1 (those with no COMMIT record). For each operation, it applies the before-image (OldValue) to reverse the change. For every UNDO operation performed, ARIES writes a Compensation Log Record (CLR) to the WAL — ensuring that if the system crashes again during UNDO, these reversal operations are not accidentally re-applied.

ARIES Phase	Log Scan Direction	Starting Point	What It Does	Targets
1. Analysis	Forward →	Last Checkpoint	Rebuilds Transaction Table + Dirty Page Table	All transactions
2. REDO	Forward →	min(RecLSN) in Dirty Page Table	Reapplies ALL changes to restore crash-state	ALL transactions (winners + losers)
3. UNDO	Backward ←	End of log (crash point)	Reverses changes from uncommitted transactions using before-images	Only LOSER transactions (no COMMIT)

Shadow Paging: The Alternative Approach

Shadow paging is a fundamentally different recovery mechanism that eliminates the need for a REDO/UNDO log entirely. Instead of logging changes, it maintains two versions of the database page directory simultaneously and uses an atomic pointer swap to commit or discard transactions.

How Shadow Paging Works

The database maintains two directory tables — the Current Directory (pointing to the live, in-use data pages) and the Shadow Directory (a backup copy pointing to the original, unchanged pages):

Transaction begins → The Shadow Directory is made a copy of the Current Directory.
Page modification → Instead of modifying the existing page, the DBMS writes changes to a completely new, empty disk page. The Current Directory pointer is updated to point to the new page. The Shadow Directory still points to the original page.
Commit → The master page directory pointer is atomically swapped to the Current Directory. The old Shadow pages become garbage-collectible.
Crash before commit → The atomic swap never happened. The master pointer still points to the Shadow Directory. The in-progress current pages are simply discarded — no log processing needed.

Figure 4: Shadow Paging Pointer Swap — on commit, the master pointer atomically switches to the new page. On crash, the swap never occurs and the shadow (original) page remains active — instant, log-free recovery.

Feature	Write-Ahead Logging (WAL)	Shadow Paging
Recovery Mechanism	REDO + UNDO log processing	Discard shadow pages — instant
Normal Operation Overhead	Sequential log I/O (fast)	Full page copy per modification (expensive)
Concurrency	Excellent — optimized for thousands of concurrent transactions	Very poor — page directory conflicts under concurrent writes
Data Fragmentation	None — data stays sequential and clustered	Severe — new pages scattered randomly across disk
Recovery Time	Log-bounded — depends on checkpoint frequency	Near-instant — just discard uncommitted pages
Industry Adoption	99% of production RDBMS (PostgreSQL, Oracle, MySQL, SQL Server)	Mostly obsolete; SQLite rollback journal, ZFS/Btrfs filesystems

Recovery with Concurrent Transactions

The true complexity of crash recovery emerges when multiple transactions are running simultaneously at the time of a crash. A naive REDO/UNDO implementation would struggle because:

A single dirty page in the buffer pool may have been modified by multiple different transactions at different times
UNDO must reverse only the loser transactions' changes — not interfere with pages also modified by winners
If the server crashes during UNDO, recovery must resume from the correct point without re-undoing already-undone operations

ARIES solves all three problems elegantly:

Problem	ARIES Solution
Multiple transactions touched the same page	PageLSN tracks which operation last modified each page — REDO applies changes in LSN order, ensuring correct final state
UNDO must only reverse loser transactions	Analysis phase builds the exact Transaction Table — only transactions without a COMMIT record are in the UNDO set
Crash during UNDO	CLRs (Compensation Log Records) written during UNDO are never themselves undone — they are skipped on the next recovery attempt via the UndoNextLSN pointer
Idempotent REDO on already-flushed pages	PageLSN ≥ LSN_record → skip REDO. Safe even if the same log record is processed multiple times across repeated crash/recovery cycles

Real-World Case Study: The Power Grid Financial Crisis

Aspect	Details
The Setup	A major financial trading platform processes millions of micro-transactions per second across a PostgreSQL cluster. PostgreSQL implements WAL and a ARIES-derived fuzzy checkpoint recovery model.
The Failure	A regional power grid failure simultaneously killed both the primary and failover database servers mid-transaction. Millions of in-flight trades (account debits + credits) were in the buffer pool — never flushed to disk.
The Impact	All in-memory buffer pool data was destroyed. Without WAL, billions of dollars in trades would be unaccounted for — accounts debited without corresponding credits, creating insolvency.
The Recovery	PostgreSQL's WAL had durably recorded every operation. On restart: Analysis identified the crash LSN in 8 seconds. REDO phase rebuilt the exact pre-crash buffer pool state from the WAL in 2 minutes. UNDO phase rolled back 47 incomplete transactions in 12 seconds. Total recovery time: 2 minutes 20 seconds.
The Lesson	A properly tuned WAL + ARIES implementation is mathematically indestructible against crash failures. The only vulnerabilities are: (1) corrupted WAL files (prevented by RAID storage), and (2) software bugs in the recovery code itself (prevented by formal verification).

Key Statistics & Industry Data (2026)

WAL Throughput — Modern NVMe SSDs have accelerated WAL sequential write speeds, allowing high-performance databases to sustain 250,000+ log record commits per second on optimized hardware. (Source: PostgreSQL Performance Benchmarks, 2025)
Checkpoint RTO Improvement — By implementing frequent fuzzy checkpoints (every 5 minutes instead of every 30), enterprise databases reduce their Recovery Time Objective (RTO) by up to 85% — from 30-minute restarts to under 5-minute restarts. (Source: Oracle Database Engineering Blog, 2025)
ARIES Industry Dominance — Over 95% of global enterprise relational databases (PostgreSQL, Oracle, IBM DB2, SQL Server, MySQL InnoDB) use ARIES or a direct derivative as their crash recovery algorithm. (Source: VLDB Survey on RDBMS Internals, 2024)
WAL Replication — WAL streaming replication is used to maintain hot standby replicas with sub-second replication lag in high-availability PostgreSQL deployments — the same mechanism that prevents data loss on primary failure. (Source: AWS RDS PostgreSQL Architecture, 2026)

Where Crash Recovery Mechanisms Are Applied

OLTP Financial Systems
Banking and trading platforms use ARIES/WAL to guarantee that account debit operations are atomically paired with credits — an incomplete transfer due to a crash is detected and reversed during UNDO.
WAL-Based Replication
PostgreSQL Streaming Replication and MySQL Binary Log Replication both use the WAL/binlog as a real-time change feed — replica servers continuously apply REDO from the primary, keeping hot standbys in sync within milliseconds.
Point-in-Time Recovery (PITR)
DBAs restore from a full backup and replay the archived WAL forward to any specific timestamp — recovering the database to the exact second before an accidental DROP TABLE or data corruption event.
Cloud Database Snapshots
AWS RDS, Google Cloud SQL, and Azure SQL Database use WAL-backed snapshots — backup is a fuzzy checkpoint snapshot, and WAL archives allow restoration to any point between snapshots.
Copy-on-Write Filesystems
ZFS and Btrfs filesystems use shadow-paging-inspired Copy-on-Write at the filesystem level — never overwriting existing data blocks, making the entire filesystem crash-consistent without a separate log.

Advantages of Log-Based Recovery (WAL + ARIES)

Mathematical ACID Guarantee — WAL + ARIES provides a formal, provably correct implementation of Atomicity and Durability for any combination of concurrent transactions
Fast Normal Operations — sequential WAL writes are dramatically faster than random data page writes; the database can confirm commits in microseconds while deferring physical page writes
Idempotent Recovery — ARIES's LSN-based PageLSN comparison makes recovery safe even if the server crashes during the recovery process itself — no data corruption from repeated restarts
Replication Foundation — the same WAL used for local crash recovery doubles as the data feed for streaming replication to standby servers, providing both HA and disaster recovery from a single mechanism
Point-in-Time Recovery — WAL archives enable restoration to any arbitrary past moment, not just the most recent backup — a critical capability for accidental data deletion recovery

Limitations & Challenges

Sequential I/O bottleneck — every transaction must wait for at least one synchronous WAL flush to disk before the commit can be acknowledged — on slow disks this becomes the critical path
WAL storage growth — active OLTP systems generate gigabytes of WAL per hour; without automated WAL archival and purging, the WAL directory fills the disk and crashes the database
Recovery time scales with checkpoint interval — infrequent checkpoints mean longer log replay during recovery; tuning checkpoint frequency is a critical DBA responsibility
Complexity — ARIES with CLRs, fuzzy checkpoints, and concurrent transaction handling is one of the most complex algorithms in computer science; bugs in recovery code are catastrophic
Shadow paging fragmentation — the alternative (shadow paging) avoids log overhead but scatters data pages randomly across disk, destroying read performance through fragmentation over time

Quick Reference Cheat Sheet

Complete crash recovery terminology for exams and interviews.

Term	Definition	Exam Tip
WAL	Write-Ahead Logging — log record must be written to durable storage before the data page is modified	The single rule that makes all log-based recovery possible
REDO	Replays committed operations to rebuild data that was in RAM but never flushed to disk before crash	Forward log scan; applies NewValue from log records
UNDO	Reverses operations from uncommitted (loser) transactions to restore pre-transaction state	Backward log scan; applies OldValue from log records
Checkpoint	A logged synchronization point ensuring all earlier dirty pages are safely on disk	Recovery only reads log from the last checkpoint — not from the beginning of time
Fuzzy Checkpoint	Checkpoint that flushes dirty pages gradually in the background while transactions continue	No stall; used by PostgreSQL, MySQL, Oracle to avoid performance spikes
ARIES	Algorithm for Recovery and Isolation Exploiting Semantics (IBM, 1992) — 3 phases: Analysis, REDO, UNDO	REDO repeats history for ALL transactions; UNDO only reverses losers
LSN	Log Sequence Number — monotonically increasing ID assigned to every WAL record	PageLSN ≥ LSN_record → skip REDO (idempotent check)
PageLSN	The LSN of the last WAL record that modified a disk page — stored in the page header	Key field for ARIES idempotence — prevents double-REDO
CLR	Compensation Log Record — written to WAL for every UNDO operation performed during recovery	CLRs are never themselves undone — they make UNDO idempotent
Dirty Page	A buffer pool page modified in RAM but not yet written to the physical database file on disk	Source of REDO need — dirty pages lost at crash must be rebuilt
Winner Tx	A transaction that had a COMMIT record in the log at crash time	Needs REDO only — its effects must be preserved
Loser Tx	A transaction with no COMMIT record in the log at crash time	Needs UNDO — its effects must be completely reversed
Shadow Paging	Writes updates to new disk pages; atomic pointer swap commits; no swap = instant discard on crash	No log needed but causes fragmentation; poor concurrency; mostly obsolete
PITR	Point-in-Time Recovery — restore backup + replay WAL to any specific timestamp	Used to recover from accidental DROP TABLE or data corruption to any historical second

Frequently Asked Questions (FAQ)

What happens if the system crashes while writing the log itself?

Databases append a checksum (CRC) to every log record before writing it. If power is lost mid-write, the partial log record will fail its checksum verification on the next boot. The DBMS recognizes this as a torn or incomplete record, discards it, and treats the corresponding transaction as never having committed — rolling it back. This is why WAL records are written atomically at the operating system block level whenever possible.

Why do we have to REDO transactions that already committed?

To make databases fast, committed data is initially only saved in RAM (as "dirty pages") — only the log record is flushed to disk at commit time. If power is lost, RAM vanishes. The REDO phase uses the durable log to rebuild all data changes that existed in RAM but were never flushed to the actual database files on disk. Without REDO, committed transactions would appear lost after a crash.

What is the difference between a Rollback and Crash Recovery?

A Rollback is a routine, planned reversal of a single transaction while the database is running normally — triggered by an application error, constraint violation, or explicit ROLLBACK statement. Crash Recovery is a system-wide event: the entire server has died and restarted, requiring the DBMS to reconstruct the full state of all active transactions from the WAL before accepting new queries.

What is a Fuzzy Checkpoint?

A traditional (sharp) checkpoint freezes all database write activity while dirty pages are flushed to disk — causing noticeable performance stutters. A Fuzzy Checkpoint allows the database to continue processing transactions while dirty pages are flushed asynchronously in the background. The checkpoint record in the log marks the beginning of the flush, not the completion. PostgreSQL, Oracle, and MySQL all use fuzzy checkpoints to eliminate performance spikes.

What is a Log Sequence Number (LSN)?

An LSN is a monotonically increasing integer assigned to every record written to the WAL. Every data page on disk also stores the PageLSN — the LSN of the last log record that modified it. During ARIES REDO, the engine compares the log record's LSN to the PageLSN: if PageLSN ≥ LSN_record, the page already contains this change and REDO is skipped. This comparison makes ARIES idempotent — safe to restart even if recovery itself crashes.

Is Shadow Paging still used in modern databases?

Shadow Paging is largely obsolete for general-purpose RDBMS due to severe data fragmentation and poor concurrency. However, it survives in specific contexts: SQLite uses a form of shadow paging in its "rollback journal" mode. Some embedded and file-based databases use it for simplicity. The concept also appears in Copy-On-Write (COW) filesystems like ZFS and Btrfs, which apply the same principle at the filesystem level.

What is Point-in-Time Recovery (PITR)?

PITR allows a DBA to restore the database to the exact state it was in at any specific second in the past. It works by restoring a full backup and then replaying the WAL archive forward to the target timestamp — stopping precisely before a catastrophic event (like an accidental DROP TABLE). PostgreSQL, Oracle, and SQL Server all support PITR as an enterprise data protection feature.

Test Your Knowledge

Ready to prove your skills? Take our rigorous multiple-choice quiz designed to test your understanding of this topic and prepare you for interviews.

Start Quiz

Database Crash Recovery: ARIES, WAL & Shadow Paging (2026)

PerfectNotes TeamUpdated June 2026

Key Takeaways

WAL is the Foundation — The Write-Ahead Log mandates that every change must be recorded in a durable log before the data page is modified. This single rule guarantees recoverability from any crash.
REDO Saves Committed Work — After a crash, RAM (and its dirty pages) is wiped. REDO replays every committed operation from the log to rebuild data that existed only in RAM at crash time.
UNDO Removes Incomplete Work — UNDO reverses every operation from transactions that were active (uncommitted) at the moment of the crash — enforcing ACID Atomicity.
ARIES = Analysis + REDO + UNDO — IBM's ARIES algorithm (1992) adds mathematical precision: LSN-based idempotence ensures recovery is safe even if the server crashes during recovery itself.
Shadow Paging is Obsolete — Shadow paging avoids logs but causes severe data fragmentation and cannot handle concurrent transactions at scale. WAL/ARIES dominates 99% of production RDBMS.

What Is Database Crash Recovery?

Atomicity — incomplete (uncommitted) transactions are completely reversed, as if they never occurred
Durability — every committed transaction's effects are permanently preserved, even if the crash happened the millisecond after the commit signal was sent