A SQL query goes through several transformation steps before execution:
SQL Query
↓ 1. Parser & Translator
Relational Algebra Expression (query tree)
↓ 2. Optimizer
Execution Plan (with chosen algorithms)
↓ 3. Evaluator
Query Result
Primary cost metric: Number of block transfers (disk I/Os). In some models, seek time is also counted separately.
Notation used throughout:
b_r = number of disk blocks in relation rn_r = number of tuples (records) in relation rM = number of memory buffer blocks available| Algorithm | Cost (block transfers) | When to Use |
|---|---|---|
| Linear scan | b_r | No index; works for any condition |
| Binary search | ⌈log₂(b_r)⌉ + k | Sorted file, equality/range on sort key |
| Primary index (equality) | ⌈log₂(b_i)⌉ + 1 | Primary B+ tree, key attribute |
| Primary index (range) | ⌈log₂(b_i)⌉ + n_blocks | Primary B+ tree, range on sort key |
| Secondary index (equality, key) | ⌈log₂(b_i)⌉ + 1 | Secondary B+ tree, key attribute |
| Secondary index (equality, non-key) | ⌈log₂(b_i)⌉ + n_r | Secondary B+ tree, duplicate keys (worst case: one I/O per record) |
Conjunctive selection: Condition is c1 AND c2 AND ...
Disjunctive selection: Condition is c1 OR c2 OR ...
Used when the relation to be sorted is too large to fit in memory.
Cost = 2 × b_r × (1 + number of merge passes)
= 2b_r × (1 + ⌈log_{M-1}(⌈b_r/M⌉)⌉)
(Each block is read and written once per pass; first factor 2 for initial run creation.)
for each tuple t_r in r: ← outer relation
for each tuple t_s in s: ← inner relation
if t_r ⋈ t_s: output
Cost:
n_r × b_s + b_r
When to use: When one relation is tiny enough to keep in memory; no index available.
for each block B_r of r:
for each block B_s of s:
for each tuple t_r in B_r:
for each tuple t_s in B_s:
if t_r ⋈ t_s: output
Cost:
⌈b_r / (M-2)⌉ × b_s + b_r
Optimization: If M is large enough to hold all of r → one pass over s.
for each tuple t_r in r:
use index on s to find tuples matching t_r's join attribute
Cost:
b_r + n_r × cost_per_index_lookup
cost_per_index_lookup = B+ tree height + 1 for primary index; may be higher for secondary1. Sort r on join attribute
2. Sort s on join attribute
3. Merge the two sorted files (like merge phase of merge-sort)
Cost:
cost_of_sorting_r + cost_of_sorting_s + b_r + b_s
≈ 2b_r(1 + log_{M-1}(b_r/M)) + 2b_s(1 + log_{M-1}(b_s/M)) + b_r + b_s
When to use:
Build phase: hash r on join attribute → r0, r1, ..., r_{n-1}
Probe phase: for each partition r_i:
load r_i into memory
hash s into the same partition s_i
probe: for each s tuple, find matches in r_i
Cost:
3(b_r + b_s) assuming b_r / M < M-2 (fits in memory)
Condition: Number of partitions × (avg partition size) ≤ M. If r doesn’t fit: use recursive partitioning (extra passes).
When to use: Large equi-joins without existing sorted order; typically fastest for large relations.
| Algorithm | Cost | Best When |
|---|---|---|
| Nested Loop | n_r × b_s + b_r | Outer is tiny |
| Block Nested Loop | ⌈b_r/(M-2)⌉ × b_s + b_r | Limited memory, no index |
| Indexed Nested Loop | b_r + n_r × idx_cost | Index on inner join attr |
| Sort-Merge | Sort costs + b_r + b_s | Pre-sorted or need sorted output |
| Hash Join | 3(b_r + b_s) | Large equi-join, enough memory |
Given: r has 5 blocks, s has 9 blocks, M = 3 buffer blocks
Check feasibility: min(b_r, b_s) / M = 5/3 ≈ 1.67 — number of partitions ≈ 2 Each r-partition size ≈ 5/2 = 2.5 blocks ≤ M-2 = 1 ← doesn’t quite fit with M=3
With M=3: partitions = M-1 = 2
(For M large enough: 3(b_r + b_s) = 3×14 = 42 — same formula)
These rules allow reordering operations without changing results:
| Rule | Statement |
|---|---|
| Cascading σ | σ_{c1 AND c2}(R) ≡ σ_{c1}(σ_{c2}(R)) |
| Commutativity of σ | σ_{c1}(σ_{c2}(R)) ≡ σ_{c2}(σ_{c1}(R)) |
| Commutativity of ⋈ | R ⋈ S ≡ S ⋈ R |
| Associativity of ⋈ | (R ⋈ S) ⋈ T ≡ R ⋈ (S ⋈ T) |
| Push σ through ⋈ | σ_c(R ⋈ S) ≡ σ_c(R) ⋈ S (if c only involves R’s attrs) |
| Push π through ⋈ | π_L(R ⋈ S) ≡ π_{L∩R}(R) ⋈ π_{L∩S}(S) (approximately) |
Applied to the initial query tree to produce a better plan before cost estimation:
After heuristic simplification, evaluate multiple join orderings using estimated costs:
Statistics in the catalog:
n_r: number of tuples in rb_r: number of blocksV(A, r): number of distinct values of attribute A in rSelectivity of condition c:
A = v: selectivity ≈ 1/V(A, r) → estimated result = n_r / V(A, r)A < v: selectivity ≈ (v - min_A) / (max_A - min_A)For a query joining n relations, there are n! possible orderings, and each ordering can use different algorithms.
Left-deep join trees: Only consider plans where the right operand of each join is a base relation (not an intermediate result). This allows pipelining — the left operand can stream results into the next join.
Example: 6 relations → 6! = 720 left-deep orderings With dynamic programming: search space is manageable
open(), get_next(), close() interfaceA materialized view stores the precomputed result of a query. When the base tables change, the view must be updated.
Instead of recomputing the entire view, propagate only the changes.
For insert of tuple t into r (r is used in view v = r ⋈ s):
New tuples to add to v = {t} ⋈ s
v_new = v_old ∪ ({t} ⋈ s)
For delete of tuple t from r:
Tuples to remove from v = {t} ⋈ s
v_new = v_old − ({t} ⋈ s)
For aggregation (e.g., sum):
Why materialized views?
Given relations with block counts and memory M, compute the cheapest join algorithm:
Heuristic query tree transformation: