System crashes can leave the database in an inconsistent state because a transaction may be partially executed at the time of failure. Crash recovery ensures:
Types of failures:
A log (journal) on stable storage records every database modification before it happens.
| Record | Meaning |
|---|---|
[START Ti] |
Transaction Ti has begun |
[Ti, X, old_val, new_val] |
Ti updated item X from old_val to new_val |
[COMMIT Ti] |
Ti has successfully committed |
[ABORT Ti] |
Ti has been aborted |
[START CKPT (T1, T2, ...)] |
Checkpoint started; listed transactions are currently active |
[END CKPT] |
Checkpoint completed |
Stable storage: Assumed to survive any system crash (typically RAID + battery backup). Log writes are always to stable storage before data is modified.
Recovery requirements:
Best suited for: Long-lived transactions with few rollbacks (PYQ 23-24 Q13) — the overhead of buffering all updates in memory is acceptable when rollbacks are rare.
Write-Ahead Logging (WAL): The log record for any modification must be written to stable storage before the corresponding data page is written to disk.
This guarantees:
WAL is the fundamental correctness requirement for log-based recovery.
Purpose: Limit the amount of log that needs to be replayed after a crash — avoids scanning from the beginning of the log.
Two conditions to complete a checkpoint:
After the checkpoint, any transaction that committed before END CKPT is guaranteed to have its changes on disk — no REDO needed for those.
[START CKPT (T1, T2, ...)][START Ti] for each transaction active at checkpoint| Classification | Condition | Action |
|---|---|---|
| REDO | [COMMIT Ti] found in log |
Re-apply all new values (scan forward) |
| UNDO | [START Ti] found but NO [COMMIT Ti] |
Remove partial changes (restore old values, scan backward) |
[Ti, X, old, new] where Ti ∈ undo-list → set X = old_val[Ti, X, old, new] where Ti ∈ redo-list → set X = new_valLog (top = earliest, bottom = last before crash):
[START T1]
[START T2]
[T1, A, 10, 20]
[T2, B, 15, 25]
[T1, C, 30, 40]
[COMMIT T1]
[START T3]
[T2, A, 20, 30]
[T3, C, 40, 50]
[COMMIT T2]
[START T4]
[T4, A, 30, 35]
[START CKPT (T3, T4)]
[END CKPT]
[COMMIT T3]
─── CRASH ───
Classification:
| Transaction | Commit in Log? | Active at CKPT? | Result |
|---|---|---|---|
| T1 | Yes (before CKPT) | No | REDO |
| T2 | Yes (before CKPT) | No | REDO |
| T3 | Yes (after CKPT) | Yes | REDO |
| T4 | No | Yes | UNDO |
Undo phase (scan backwards for T4):
[T4, A, 30, 35] → restore A = 30Redo phase (scan forwards from START T1):
Final database state after recovery:
| Item | Value |
|---|---|
| A | 30 |
| B | 25 |
| C | 50 |
Question: How do you undo a transaction that has already committed?
Answer: Use a compensating (compensatory) transaction — a new transaction that performs the inverse operations.
You cannot simply “undo” a committed transaction because:
A compensating transaction applies the logical inverse (e.g., if T inserted row R, the compensating transaction deletes R). This creates a proper audit trail and maintains log integrity.
| Problem | Explanation |
|---|---|
| Poor concurrency support | Designed for single-user/single-transaction; multi-transaction coordination is complex |
| Fragmentation | After multiple commits, logically related pages scatter across disk; hurts sequential I/O |
| High overhead for large DBs | Copying and swapping huge page tables on every commit is expensive |
| No fine-grained undo | Cannot undo individual operations; only all-or-nothing rollback |
Suitable for: File systems (journaling), SQLite rollback journal, single-user embedded databases.
Purpose: Checkpointing limits the amount of log that must be scanned (redone) during recovery.
Without checkpoints, recovery would require replaying the log from the very beginning — impractical as databases grow.
Two conditions that must be met to complete a checkpoint:
A transaction T2 computes a sum while T1 transfers 100 between accounts.
Initial state: X = 500, Z = 250
| Time | T1 | T2 |
|---|---|---|
| t1 | Begin | |
| t2 | Read(X) = 500 | Begin; Sum = 0 |
| t3 | X = 500 − 100 = 400 | Read(X) = 500; Sum += 500 → Sum = 500 |
| t4 | Write(X=400) | |
| t5 | Read(Z) = 250 | |
| t6 | Z = 250 + 100 = 350 | |
| t7 | Write(Z=350); Commit | Read(Z) = 350; Sum += 350 → Sum = 850 |
| t8 | Write(Sum=850); Commit |
Result: T2 computes Sum = 850 instead of 750.
Explanation: T2 read X before T1’s debit but Z after T1’s credit — it observed a mix of old and new values. This is the Inconsistent Analysis problem (also called unrepeatable read or phantom read). Concurrency control (e.g., 2PL) prevents this by ensuring T2 either sees the old state or the new state, but not a mix.
Full recovery walkthrough from log:
Check if a recovery scheme maintains WAL: