A poorly designed schema leads to update anomalies:
| Anomaly | Description | Example |
|---|---|---|
| Insertion | Can’t add data without unrelated data | Can’t add a department without an employee |
| Deletion | Deleting data loses unrelated data | Deleting last employee also deletes department info |
| Modification | Updating one fact requires many changes | Employee’s department name appears in many rows |
The goal of normalization is to eliminate these anomalies by decomposing relations into smaller, well-structured ones.
A functional dependency (FD) X → Y holds in relation R if, for any two tuples t1 and t2 in R, whenever t1[X] = t2[X], then t1[Y] = t2[Y].
“X determines Y” — knowing the value of X uniquely determines the value of Y.
Example: In EMPLOYEE(emp_id, name, dept_id, dept_name):
emp_id → name (knowing emp_id tells us the name)dept_id → dept_name (knowing dept_id tells us dept_name)emp_id → dept_id (emp works in one dept)emp_id → dept_name (transitively){A, B} → A (always holds, vacuously true)Given a composite key {A, B} → C:
A complete and sound set of inference rules for FDs:
| Rule | Statement | Note |
|---|---|---|
| Reflexivity | If Y ⊆ X, then X → Y | Generates trivial FDs |
| Augmentation | If X → Y, then XZ → YZ | Add same attrs to both sides |
| Transitivity | If X → Y and Y → Z, then X → Z | Chain rule |
| Rule | Statement | |——|———–| | Union | If X → Y and X → Z, then X → YZ | | Decomposition | If X → YZ, then X → Y and X → Z | | Pseudo-transitivity | If X → Y and WY → Z, then WX → Z |
Purpose: Determine what attributes are functionally determined by a set X, given a set F of FDs.
Algorithm: Closure(X, F)
X⁺ = X
repeat:
for each FD (Y → Z) in F:
if Y ⊆ X⁺:
X⁺ = X⁺ ∪ Z
until X⁺ does not change
return X⁺
Use cases:
Worked example: Schema: R(A, B, C, D, E), FDs: {A→BC, CD→E, B→D, E→A}
Find A⁺:
Algorithm:
Example: R(A, B, C, D), FDs: {AB→C, C→D, D→A}
Definition: All attribute values are atomic — no multi-valued or composite attributes.
Every relation in the relational model is technically in 1NF by definition (attributes are atomic domains). 1NF is violated when:
Fix: Flatten composite attributes, create separate tables for multi-valued attributes.
Definition: In 1NF AND every non-prime attribute is fully functionally dependent on every candidate key (no partial dependencies).
Violation: A non-prime attribute depends on only part of a composite candidate key.
Example violation:
ENROLLMENT(student_id, course_id, grade, course_name)
FDs: (student_id, course_id) → grade
course_id → course_name ← partial dependency!
course_name depends on just course_id, not the full key.
Fix: Decompose to remove partial dependency:
ENROLLMENT(student_id, course_id, grade)
COURSE(course_id, course_name)
Note: If a relation has a single-attribute candidate key, it’s automatically in 2NF (no partial dependencies possible).
Definition: In 2NF AND no non-prime attribute is transitively dependent on any candidate key.
Violation: A non-prime attribute depends on another non-prime attribute (which depends on the key).
Example violation:
EMPLOYEE(emp_id, dept_id, dept_name)
FDs: emp_id → dept_id
dept_id → dept_name ← dept_name transitively depends on emp_id!
Fix: Decompose:
EMPLOYEE(emp_id, dept_id)
DEPARTMENT(dept_id, dept_name)
3NF Formal Definition (for exam): For every non-trivial FD X → A in R:
Definition: For every non-trivial FD X → Y in R, X must be a superkey.
BCNF is stricter than 3NF — it removes the “A is prime” exception.
Example: A relation can be in 3NF but not BCNF when:
TEACHING(student, course, teacher)
FDs: (student, course) → teacher ← (s,c) is a CK
teacher → course ← teacher is not a superkey, but course is prime!
This is in 3NF (course is prime attribute) but NOT in BCNF (teacher → course violates BCNF).
Given relation R and FDs F:
result = {R}
while some relation Ri in result is not in BCNF:
find a non-trivial FD X → Y in Ri that violates BCNF (X is not superkey)
decompose Ri into:
R1 = X ∪ Y (X → Y, so X is key of R1)
R2 = Ri − Y (contains all other attributes + X)
replace Ri with R1 and R2 in result
return result
Guarantee: BCNF decomposition always produces lossless-join decomposition. But it may NOT preserve all FDs.
Trade-off: BCNF vs 3NF:
A decomposition of R into R1 and R2 is lossless iff:
R1 ∩ R2 → R1 OR R1 ∩ R2 → R2
i.e., the common attributes form a superkey of at least one of the sub-relations.
Intuition: If the common attributes uniquely identify tuples in one of the relations, we can always reconstruct R exactly by joining R1 and R2.
Lossy decomposition: The natural join of R1 and R2 produces spurious tuples — fake rows that weren’t in the original relation.
A decomposition {R1, R2, …, Rk} of R preserves dependencies iff:
(F1 ∪ F2 ∪ ... ∪ Fk)⁺ = F⁺
where Fi is the projection of F onto Ri (FDs that only involve attributes of Ri).
Why it matters: If a FD X → Y is not preserved in any sub-relation, enforcing it requires joining multiple relations — expensive!
Checking: For each original FD X → Y:
A canonical cover Fc of F is a minimal set of FDs equivalent to F (same closure). Used in 3NF synthesis.
Algorithm:
| Remove extraneous LHS attributes: For each FD X → A where | X | > 1: |
Result: Fc is equivalent to F but has no redundant attributes on LHS, no redundant FDs, and single attributes on RHS.
Guarantees both lossless join AND dependency preservation.
1. Compute canonical cover Fc of F
2. For each FD X → A in Fc:
create relation schema R_i = X ∪ A with FD X → A
3. If no schema in result contains a candidate key of R:
add a schema containing a candidate key of R (ensures lossless join)
4. Remove any schema that is a subset of another schema
Key property: 3NF synthesis always produces a decomposition that is:
Full normalization walkthrough (most common exam type):
Canonical cover computation:
Lossless join test after decomposition: