Proposed by E.F. Codd (1970), the relational model organizes data as a collection of relations (tables). It provides:
| Formal Term | Common Term | Meaning |
|---|---|---|
| Relation | Table | A named, 2-dimensional structure |
| Tuple | Row / Record | A single data entry |
| Attribute | Column / Field | A named property |
| Domain | Data type | Set of permitted values for an attribute |
| Degree | Arity | Number of attributes in a relation |
| Cardinality | — | Number of tuples in a relation |
| Key Type | Definition |
|---|---|
| Superkey | Any attribute set that uniquely identifies tuples |
| Candidate key | Minimal superkey (no proper subset is also a superkey) |
| Primary key (PK) | Chosen candidate key used to identify tuples |
| Foreign key (FK) | Attribute(s) in one relation referencing the PK of another |
| Prime attribute | Attribute that is part of some candidate key |
| Non-prime attribute | Attribute that is not part of any candidate key |
Entity Integrity: Primary key attributes must NOT be NULL. Every tuple must have a valid, non-null identifier.
Referential Integrity: If a foreign key value exists in a referencing table, the referenced value must exist in the referenced table (or the FK must be NULL).
Referential Integrity Violation Handling:
RESTRICT (default): reject the operationCASCADE: propagate the change (delete/update) to referencing rowsSET NULL: set FK to NULL in referencing rowsSET DEFAULT: set FK to a default valueNULL represents unknown or not applicable — it is NOT zero, empty string, or “N/A”.
When NULLs are involved in comparisons, the result can be TRUE, FALSE, or UNKNOWN.
AND table:
| AND | T | F | U |
|---|---|---|---|
| T | T | F | U |
| F | F | F | F |
| U | U | F | U |
OR table:
| OR | T | F | U |
|---|---|---|---|
| T | T | T | T |
| F | T | F | U |
| U | T | U | U |
NOT:
Critical rule: A WHERE clause only passes tuples where the condition evaluates to TRUE — tuples where condition is UNKNOWN are excluded.
Comparison with NULL:
salary = NULL → UNKNOWN (not TRUE or FALSE!)NULL = NULL → UNKNOWNIS NULL or IS NOT NULL for proper NULL checksCOUNT(*) counts all rows including NULLsCOUNT(attribute) counts only non-NULL valuesSUM, AVG, MIN, MAX all ignore NULLsGROUP BY treats NULLs as equal (NULLs group together)Relational Algebra is a procedural query language — you specify HOW to retrieve data step by step. It operates on relations and returns relations (closure property).
Selects tuples that satisfy a condition.
σ_condition(R)
σ_(salary > 50000)(EMPLOYEE) — all employees earning more than 50KSelects specific attributes (columns).
π_attr1, attr2, ...(R)
π_(name, salary)(EMPLOYEE)Renames a relation or its attributes.
ρ_S(R) — renames relation R to S
ρ_S(A1, A2)(R) — renames relation and attributes
Returns all tuples in R or S (or both).
R ∪ S
Returns tuples in R that are NOT in S.
R − S
Combines every tuple of R with every tuple of S.
R × S
Joins relations on all common attributes, keeping only matching tuples.
R ⋈ S
Equivalent to: π_combined_attrs(σ_R.A=S.A(R × S)) for shared attribute A
Join with a general condition θ.
R ⋈_θ S = σ_θ(R × S)
Theta join where θ uses only equality conditions.
Returns tuples in R that match ALL tuples in S.
R ÷ S
Example: Find students who have taken ALL courses offered:
ENROLLMENT(student_id, course_id) ÷ COURSE(course_id)
Returns tuples present in both R and S.
R ∩ S = R − (R − S)
| Type | Tuples Kept |
|---|---|
| Left Outer Join (⟕) | All from left + matching from right; unmatched right → NULLs |
| Right Outer Join (⟖) | All from right + matching from left; unmatched left → NULLs |
| Full Outer Join (⟗) | All from both; unmatched padded with NULLs |
RC is a declarative query language — you specify WHAT you want, not how to compute it.
{t | P(t)}
Returns all tuples t that satisfy predicate P.
Example: Find all employees with salary > 50000:
{t | t ∈ EMPLOYEE ∧ t.salary > 50000}
Example: Find names of employees who work in dept D5:
{t.name | t ∈ EMPLOYEE ∧ t.dept_id = 'D5'}
Existential (∃): “there exists some t such that...”
{t.name | t ∈ EMPLOYEE ∧ ∃d (d ∈ DEPARTMENT ∧ d.dept_id = t.dept_id ∧ d.location = 'Houston')}
Universal (∀): “for all t...”
Used for division-like queries (“for every”).
Variables range over attribute values (domains) rather than tuples.
{<x1, x2, ...> | P(x1, x2, ...)}
Forms the basis for QBE (Query-by-Example).
An RC expression is safe if it is guaranteed to produce a finite result. Unsafe expressions reference infinite domains (e.g., “all tuples NOT in R”). The relational algebra is inherently safe.
Any query expressible in Relational Algebra can be expressed in both TRC and DRC, and vice versa. They are relationally complete — equivalent in expressive power.
Pure relational algebra uses set semantics (no duplicates). SQL uses bag semantics (duplicates allowed unless explicitly eliminated).
| Operation | Set (RA) | Bag (SQL) |
|---|---|---|
| SELECT | σ (deduplicates) | SELECT (keeps duplicates) |
| Distinct | implicit | SELECT DISTINCT |
| UNION | eliminates duplicates | UNION ALL keeps duplicates; UNION eliminates |
| Intersection | eliminates duplicates | INTERSECT |
| Difference | eliminates duplicates | EXCEPT |
RA expression writing: Given a schema, write RA expressions for:
π_name(σ_(dept_id='D5' ∧ salary>50000)(EMPLOYEE))(π_(emp_id, proj_id)(WORKS_ON)) ÷ (π_proj_id(σ_(location='Houston')(PROJECT)))
then join back to get names.3VL truth table: Evaluate compound conditions involving NULLs step by step.