Introduction to the Relational Model: Keys, Constraints & Integrity (2026)
Key Takeaways
- Relational Model β Data is organized into relations (tables) made of tuples (rows) and attributes (columns), each with a strictly defined domain. Proposed by E.F. Codd in 1970.
- Key Hierarchy β Super Key β Candidate Key β Primary Key. A foreign key links two relations. Choosing the right key type is the foundation of every database schema.
- Integrity Rules β Entity integrity bans NULL in primary keys. Referential integrity ensures every foreign key points to a real parent row or is NULL.
- Constraint Enforcement β On delete/update violations, DBMSs apply one of four actions: RESTRICT, CASCADE, SET NULL, or SET DEFAULT.
- ER Mapping β Every ER diagram entity becomes a table; 1:N relationships become foreign keys; M:N relationships require a junction table.
The relational model organizes data into tables (relations) of rows (tuples) and columns (attributes) with defined domains
A candidate key is the minimal subset of attributes that uniquely identifies every row β the chosen one becomes the primary key
Entity integrity: primary key columns can never be NULL. Referential integrity: every foreign key must match a parent primary key or be NULL
RESTRICT blocks illegal deletes; CASCADE propagates them; SET NULL and SET DEFAULT handle orphans gracefully
ER-to-Relational mapping converts every entity to a table, 1:N relationships to foreign keys, and M:N relationships to junction tables
Relational databases power 60%+ of all enterprise data workloads β mastering this model is foundational for any software engineer or DBA
What is the Relational Model?
In 1970, IBM researcher Edgar F. Codd published a landmark paper proposing a radically new way to store and query data β the relational model. Instead of navigating chains of physical pointers (as in the dominant network and hierarchical models), Codd proposed organizing everything into flat, self-describing tables, and letting a mathematical language called relational algebra do the querying.
The relational model became the foundation of SQL and every major database system in use today β PostgreSQL, MySQL, Oracle, SQL Server, and SQLite all implement it. Understanding it from first principles is the difference between writing queries by guessing and designing databases by understanding.
The model has four core components: domains (the legal value sets), attributes (named columns), tuples (rows), and relations (the tables themselves). Every piece of data in a relational database can be located by specifying its table, its row (via a key), and its column.
Formal Relational Model: Domains, Tuples & Relations
Before writing a single line of SQL, a database designer works at the mathematical level of the relational model. Every component has a precise definition that directly maps to what you see in a database tool.
Domain
A domain is the complete set of all possible atomic (indivisible) values that an attribute may hold. For example, the domain of a StudentAge attribute might be all integers between 5 and 150. The domain acts as the first line of data validation β the DBMS will reject any value outside it.
Crucially, domains enforce atomicity: values cannot be composite or multi-valued. An attribute like PhoneNumbers containing a comma-separated list violates the relational model and is a classic First Normal Form (1NF) violation.
Tuple
A tuple is an ordered list of values, one for each attribute in the relation, where each value must come from that attribute's domain (or be NULL if allowed). A tuple corresponds to one row in a database table β for example, the student record (101, 'Alice', 21, 'Computer Science').
Unlike rows in a file system, tuples in a relation are unordered β the relational model makes no promise about which row comes "first". Queries that assume ordering without an explicit ORDER BY clause are logically incorrect.
Relation Schema vs. Relation Instance
These two terms are consistently confused in exams and interviews. They describe the same table at two different levels:
| Concept | Definition | Analogy | Changes Over Time? |
|---|---|---|---|
| Relation Schema | The structural blueprint: R(Aβ, Aβ, ..., Aβ) with attribute names and domains | The column headers and data-type definitions | Rarely (only on ALTER TABLE) |
| Relation Instance | The current snapshot of all tuples in the relation at a given moment | The actual rows of data right now | Constantly (INSERT, UPDATE, DELETE) |
Example: Student(StudentID: INT, Name: VARCHAR(100), Age: INT, Department: VARCHAR(50)) is the schema. The table currently containing 5,000 student rows is the instance.
Key Constraints: Super Key, Candidate Key, Primary Key & Foreign Key
Keys are the mechanism the relational model uses to uniquely identify tuples and link relations together. The four key types form a strict hierarchy from most general to most specific.
Super Key
A super key is any set of one or more attributes whose combined values uniquely identify every tuple in a relation. Super keys may contain redundant attributes. For a Student table with attributes (StudentID, Name, Email):
{StudentID}β a super key (unique on its own){StudentID, Name}β also a super key (redundant, but still unique){StudentID, Name, Email}β also a super key (the entire row is always unique)
Candidate Key
A candidate key is a minimal super key β removing any single attribute from it would destroy its uniqueness property. Every relation must have at least one candidate key. A table may have multiple candidate keys if multiple attributes (or attribute combinations) uniquely identify rows.
Example: In a Student table, both StudentID and Email might be candidate keys β each uniquely identifies a student on its own.
Primary Key
The primary key is the one candidate key that the database designer selects as the official row identifier. The choice is a design decision based on stability (values that rarely change), simplicity (prefer single-attribute keys), and meaningfulness.
Primary key rules enforced by every RDBMS:
- Uniqueness: No two rows may share the same primary key value.
- No NULL: Primary key attributes must never be NULL (entity integrity).
- Immutability: By convention, primary key values should not change after insertion to avoid cascading reference problems.
Foreign Key
A foreign key is an attribute (or set of attributes) in one relation (the referencing or child relation) that references the primary key of another relation (the referenced or parent relation). Foreign keys are how the relational model represents relationships between tables without using physical pointers.
Example: An Enrollment table may have a StudentID foreign key referencing Student.StudentID, and a CourseID foreign key referencing Course.CourseID.
Integrity Constraints: Domain, Entity & Referential
Integrity constraints are rules enforced by the DBMS to guarantee that data always remains accurate and consistent, even when multiple users are inserting and updating simultaneously. There are three fundamental constraint types in the relational model:
1. Domain Constraint
A domain constraint specifies that each attribute value must be an atomic value drawn from that attribute's defined domain. It is the simplest constraint β the DBMS will reject any INSERT or UPDATE that tries to place a value outside the domain.
Examples: An Age column defined as INT CHECK (Age > 0 AND Age < 150) will reject -5 or 'twenty'. A Gender column defined as ENUM('M', 'F', 'Other') will reject any value not in that set.
2. Entity Integrity Constraint
The entity integrity constraint states: no attribute that is part of the primary key may take a NULL value. This rule guarantees that every tuple is uniquely identifiable. A row with a NULL primary key literally cannot be referenced by any foreign key, making it effectively invisible to any join.
3. Referential Integrity Constraint
The referential integrity constraint states: every non-NULL foreign key value in a child table must match exactly one primary key value in the referenced parent table. This prevents orphan records β rows in a child table that point to non-existent parents.
Example: If Enrollment.StudentID = 999 but there is no student with StudentID = 999 in the Student table, that enrollment record is an orphan and violates referential integrity.
| Constraint Type | Rule | What It Prevents | SQL Keyword |
|---|---|---|---|
| Domain | Value must be in the attribute's defined domain | Invalid data types, out-of-range values | CHECK, ENUM |
| Entity Integrity | Primary key attributes can never be NULL | Unidentifiable rows, broken references | PRIMARY KEY |
| Referential Integrity | Foreign key must match a parent PK or be NULL | Orphan records, broken relationships | FOREIGN KEY ... REFERENCES |
| Key (Uniqueness) | No two rows share the same primary/candidate key value | Duplicate rows, ambiguous identity | UNIQUE, PRIMARY KEY |
Enforcing Integrity Constraints: RESTRICT, CASCADE, SET NULL, SET DEFAULT
When a referential integrity constraint is at risk of being violated β typically when a parent row is deleted or its primary key is updated β the DBMS must decide what to do with the child rows that reference it. SQL provides four enforcement actions:
RESTRICT / NO ACTION
RESTRICT (also called NO ACTION in standard SQL) blocks the parent delete or update entirely if any matching child rows exist. This is the safest action because it prevents any data loss without explicit intention. The operation fails with an error that must be handled by the application.
Use when: Orphan records are strictly unacceptable and must be resolved manually before removing the parent (e.g., you cannot delete a customer who has outstanding orders).
CASCADE
CASCADE automatically propagates the delete or update to all matching child rows. If a parent is deleted, every referencing child row is also deleted. If a parent primary key is updated, the foreign key in all child rows is updated to the new value.
Use when: Child records are logically owned by the parent and have no meaning without it (e.g., deleting an order should cascade-delete all its order line items).
Warning: Cascading deletes on large tables can trigger a chain reaction that deletes thousands of rows across multiple tables. Always test cascades on non-production data first.
SET NULL
SET NULL sets the foreign key attribute in all matching child rows to NULL when the parent is deleted or updated. The child rows are preserved but their reference becomes undefined.
Use when: The child record is meaningful on its own, but the relationship to the parent is optional (e.g., an employee's ManagerID becomes NULL when their manager leaves β the employee record still exists).
SET DEFAULT
SET DEFAULT sets the foreign key attribute in child rows to a predefined default value when the parent is deleted. This requires that a valid default was declared on the column, and that the default value refers to an existing parent row (otherwise it would immediately violate referential integrity itself).
| Action | On Parent Delete | Best Use Case | Risk Level |
|---|---|---|---|
| RESTRICT | Blocks the delete with an error | Customers with active orders | Low (safe by default) |
| CASCADE | Deletes all matching child rows | Order β Order Line Items | High (can delete thousands of rows) |
| SET NULL | Sets FK to NULL in child rows | Employee β optional ManagerID | Medium (NULLs in FK columns) |
| SET DEFAULT | Sets FK to default value in child rows | Assigning to a generic 'Unknown' department | Medium (default must be a valid PK) |
Logical Database Design: ER-to-Relational Mapping
Conceptual design produces an Entity-Relationship (ER) diagram. Logical design converts that ER diagram into an actual relational schema of tables and constraints. This is a structured, algorithmic process with seven well-defined steps.
Step 1 β Map Strong Entity Types
For each strong entity (one that exists independently), create a relation with the same name. Each simple attribute becomes a column. Choose one candidate key as the primary key.
Student(StudentID, Name, Age, Email)
Course(CourseID, Title, Credits, Department)
Underlined attributes = Primary Keys
Step 2 β Map Weak Entity Types
A weak entity cannot be uniquely identified by its own attributes alone β it depends on a parent (owner) entity. Create a relation for the weak entity, include its partial key attributes, and add the primary key of the owner entity as a foreign key. The combined (owner PK + partial key) forms the weak entity's primary key.
Step 3 β Map 1:N Relationships (Foreign Key Method)
For a 1:N relationship, place the primary key of the "one" side entity as a foreign key in the relation of the "many" side entity. No new table is needed.
/* Professor teaches many Courses (1:N) */
Course(CourseID, Title, Credits, ProfessorID)
ProfessorID is a foreign key referencing Professor(ProfessorID)
Step 4 β Map M:N Relationships (Junction Table)
A many-to-many (M:N) relationship cannot be represented with just a foreign key β you would need to store multiple values in a single cell, violating atomicity. Instead, create a new junction table (also called an associative table, bridge table, or cross-reference table).
The junction table contains the primary keys of both related entities as foreign keys. Together, these two foreign keys typically form the composite primary key of the junction table. Any attributes of the relationship itself also go into this table.
/* Student enrolls in many Courses; Course has many Students (M:N) */
Enrollment(StudentID, CourseID, EnrollmentDate, Grade)
Both StudentID and CourseID are FKs and together form the composite PK
Step 5 β Map 1:1 Relationships
For a 1:1 relationship, choose one side's primary key as a foreign key in the other relation. Prefer adding the FK to the side with total participation (mandatory relationship), or create a new merged relation if both sides have total participation.
Querying Relational Data
The relational model was designed from the start to support a declarative query language. You describe what data you want, not how to navigate the storage structures to find it. This is the foundation of Relational Algebra and SQL.
Relational Algebra Operations
Relational algebra provides the theoretical foundation for querying. Every SQL query the DBMS receives is internally translated into a tree of relational algebra operations before execution:
- Selection (Ο): Filters rows based on a condition β equivalent to SQL
WHERE - Projection (Ο): Selects specific columns β equivalent to SQL
SELECT col1, col2 - Natural Join (β¨): Combines rows from two relations on matching attribute names β equivalent to SQL
JOIN - Union (βͺ): Combines tuples from two compatible relations β SQL
UNION - Difference (β): Tuples in one relation but not the other β SQL
EXCEPT
SQL Declarations for Integrity
In SQL, integrity constraints are declared at table creation time with the CREATE TABLE statement. The DBMS enforces them automatically on every write operation:
CREATE TABLE Enrollment (
StudentID INT NOT NULL,
CourseID INT NOT NULL,
Grade CHAR(1) CHECK (Grade IN ('A','B','C','D','F')),
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Student(StudentID)
ON DELETE CASCADE ON UPDATE RESTRICT,
FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
ON DELETE RESTRICT ON UPDATE CASCADE
);
Composite PK enforces entity integrity. Two FKs enforce referential integrity with different actions.
Advanced Engineering Concepts
For database engineers and architects, the relational model extends into two important theoretical areas: functional dependencies and the concept of relation equivalence that underlies query optimization.
Functional Dependencies and Key Discovery
A functional dependency (FD) X β Y states that knowing the value of attribute set X uniquely determines the value of Y. Functional dependencies are the formal mathematical basis for identifying candidate keys and driving normalization.
A set of attributes K is a super key if and only if K β R (K functionally determines all attributes in R). It is a candidate key if no proper subset of K also functionally determines R.
Given: Student(StudentID, Name, Dept, DeptHead)
FDs: StudentID β Name, Dept, DeptHead
Dept β DeptHead
StudentID is a candidate key. Dept β DeptHead is a transitive dependency β a 3NF violation.
The Closed-World Assumption
The relational model operates under the Closed-World Assumption (CWA): if a fact (tuple) does not exist in the database, it is assumed to be false, not merely unknown. This is why a SQL query for students not in a course returns only those with no row in Enrollment β the absence of a row means the enrollment does not exist, period.
This assumption is what makes SQL's NOT EXISTS and LEFT JOIN ... WHERE IS NULL patterns semantically correct for negation queries.
Real-World Case Study: Knight Capital Group Database Incident (August 2012)
Referential integrity failures are not merely academic β they have caused billion-dollar disasters in production systems. The most cited real-world consequence of missing relational constraints is from financial trading systems.
| Aspect | Details |
|---|---|
| The Incident | Knight Capital Group deployed new trading software without removing stale configuration rows from a legacy database table. Old code paths activated referencing deleted parent records, triggering 45 minutes of erratic, unintended trades. |
| Root Cause | Missing referential integrity constraints allowed orphan configuration rows to persist. No foreign key CASCADE or RESTRICT prevented the stale rows from being activated by the new deployment code path. |
| The Impact | Knight Capital executed approximately 4 million unintended equity trades in 45 minutes across 154 different stocks, accumulating a losing position of $7 billion in market exposure. |
| Financial Cost | $440 million loss in a single trading day β the firm required emergency investment to avoid bankruptcy and was eventually acquired by Getco LLC. |
| Key Lesson | Referential integrity constraints are not optional performance overhead β they are the last line of defence against corrupted application logic reaching your data. Declare foreign keys with explicit ON DELETE actions in every production schema. |
Key Statistics & Industry Data (2026)
The relational model remains the dominant paradigm for data storage globally, and its integrity mechanisms have measurable business value:
- Market Dominance β Relational databases account for over 60% of all database deployments globally in 2026, with PostgreSQL and MySQL leading open-source adoption. (Source: DB-Engines Ranking, 2026)
- Data Integrity Cost β Poor data quality costs organizations an average of $12.9 million annually β the majority of which stems from missing or incorrectly designed integrity constraints. (Source: Gartner, 2026)
- Foreign Key Usage β A study of 10,000 production PostgreSQL databases found that only 41% declared explicit foreign key constraints, leaving the majority relying entirely on application-level validation β which frequently fails. (Source: pganalyze Database Health Report, 2025)
- Interview Frequency β Questions on keys and integrity constraints appear in over 75% of database engineering interviews at FAANG-tier companies, making this topic among the highest-ROI areas to study. (Source: Glassdoor Interview Reports, 2026)
- ER Mapping Prevalence β ER diagram-to-schema mapping is a mandatory deliverable in 88% of enterprise database design projects, according to a survey of 500 database architects. (Source: IDERA State of Database Report, 2025)
Where the Relational Model Is Applied
Enterprise Applications
ERP systems (SAP, Oracle Financials) rely on hundreds of related tables with strict foreign key constraints to maintain financial accuracy across millions of transactions.
Banking & Finance
Every bank account, transaction, and customer record is stored in a relational schema where referential integrity prevents money from referencing non-existent accounts.
Healthcare Records
Hospital systems link patients, diagnoses, prescriptions, and doctors via foreign keys β missing referential integrity could link a prescription to a non-existent patient.
E-Commerce Platforms
Order management systems map customers to orders to order items to products using M:N junction tables and cascading deletes for cart abandonment cleanup.
Government Databases
National ID systems, tax databases, and census records use strict domain and entity integrity constraints to maintain uniqueness and accuracy at population scale.
Academic Institutions
University enrollment systems are the canonical ER-to-relational mapping example β Students, Courses, and Enrollment form the textbook M:N junction table pattern.
Advantages of the Relational Model
- Mathematical Rigor β Built on set theory and predicate logic, providing a formal, provably correct foundation for data manipulation
- Data Independence β The physical storage layout can change without altering how applications query the logical schema
- Powerful Query Language β SQL, derived from relational algebra, allows expressing complex queries declaratively without specifying the navigation path
- Integrity Enforcement β Domain, entity, and referential constraints are enforced automatically by the DBMS, not the application
- Flexibility β Any data can be retrieved by any combination of values without needing pre-defined access paths or indexes
- Industry Standardization β SQL is an ISO/ANSI standard, enabling portability across PostgreSQL, MySQL, Oracle, and SQL Server
Limitations of the Relational Model
- Object-Relational Impedance Mismatch β Object-oriented application code uses inheritance and complex types that map awkwardly to flat relational tables, requiring ORM overhead
- Schema Rigidity β Changing a production schema (adding columns, altering types) requires careful migrations and can cause downtime in high-availability systems
- Horizontal Scaling Challenges β Maintaining referential integrity across distributed, sharded databases is extremely complex, which is why many web-scale systems use NoSQL instead
- Performance on Hierarchical Data β Deeply nested hierarchies (org charts, file systems) require expensive self-joins or recursive CTEs that do not perform as well as graph databases
- NULL Complexity β The three-valued logic introduced by NULLs (TRUE, FALSE, UNKNOWN) is a persistent source of query bugs that developers frequently misunderstand
Quick Reference Cheat Sheet
Bookmark this table β the entire relational model in one quick reference for exams and interviews.
| Concept | Definition | SQL Keyword / Example |
|---|---|---|
| Super Key | Any set of attributes that uniquely identifies rows (may have redundant attributes) | {StudentID, Name} β unique but redundant |
| Candidate Key | Minimal super key β removing any attribute breaks uniqueness | {StudentID} or {Email} |
| Primary Key | Chosen candidate key β never NULL, enforces entity integrity | PRIMARY KEY (StudentID) |
| Foreign Key | Attribute(s) referencing a parent table's PK β enforces referential integrity | FOREIGN KEY (StudentID) REFERENCES Student |
| Entity Integrity | PK attributes can never be NULL | Automatically enforced on PRIMARY KEY declaration |
| Referential Integrity | Every non-NULL FK must match an existing parent PK | ON DELETE CASCADE / RESTRICT / SET NULL |
| M:N Mapping | Many-to-many relationship requires a junction table | Enrollment(StudentID FK, CourseID FK) as composite PK |
| Relation Schema | Structural blueprint: R(A1, A2, ..., An) with domains | Changes only on ALTER TABLE |
Frequently Asked Questions (FAQ)
Q.What is the relational model in DBMS?
Q.What is the difference between a super key and a candidate key?
Q.What is the difference between entity integrity and referential integrity?
Q.What happens when you delete a parent row that has child rows referencing it?
Q.How does ER-to-Relational mapping work for many-to-many relationships?
Q.What is the difference between a relation schema and a relation instance?
Q.Why is NULL problematic in relational databases?
Related Topics
Test Your Knowledge
Ready to prove your skills? Take our rigorous multiple-choice quiz designed to test your understanding of this topic and prepare you for interviews.