Database

Database Theory - Key types

Feb 09, 2025
9 min read

Super Key

A Super Key is a set of one or more attributes (columns) in a table that can uniquely identify each record in that table. This means that if you take any row in the table, a super key will ensure no two rows share the same combination of attributes in that set. A single column, like a student’s ID, can be a super key if it’s guaranteed to be unique. Alternatively, multiple columns together, like a customer’s first name plus their phone number, could form a super key if no two customers share exactly the same name and phone number.

It’s important to note that a super key can sometimes include extra attributes beyond what’s strictly necessary to uniquely identify a row. That’s where the concept of minimalism comes into play. In practice, a super key might be “overkill” for identification if it contains more attributes than needed. However, it still counts as a super key as long as the combination of attributes is unique across all records.

Let’s consider a simple Employees table:

EmployeeIDFirstNameLastNameEmailPhoneNumber
101JohnSmith[email protected]555-1234
102SarahJohnson[email protected]555-5678
103DavidLee[email protected]555-7890
  • If EmployeeID alone is unique (which it usually is), then (EmployeeID) is a super key.
  • If no two employees share the same combination of Email and PhoneNumber, then (Email, PhoneNumber) is also a super key.
  • Even (EmployeeID, Email, PhoneNumber) is a super key because it still uniquely identifies each row—though it’s definitely overkill.

Candidate Key

A Candidate Key is like the “golden child” subset of super keys: it’s a minimal super key. This means it has no unnecessary attributes. If you removed any attribute from a candidate key, it would stop being unique. In other words, all parts of a candidate key are essential to guarantee uniqueness.

The reason we call it a “candidate” key is because each candidate key competes to become the Primary Key. You might have multiple candidate keys within a single table, but typically you choose one of them as the primary key. For instance, you might find that Email can uniquely identify an employee and so can EmployeeID, which means both are candidate keys. You’d then decide which one is the best primary key for your design. If you end up picking EmployeeID as the primary key, Email remains a candidate key that wasn’t chosen as the official primary key.

EmployeeIDFirstNameLastNameEmailPhoneNumber
101JohnSmith[email protected]555-1234
102SarahJohnson[email protected]555-5678
103DavidLee[email protected]555-7890

Let’s assume that we enforce uniqueness on EmployeeID and also on Email.

  • (EmployeeID) is a candidate key, because it uniquely identifies rows and contains no unnecessary columns.
  • (Email) is another candidate key, assuming no two employees are allowed to have the same email address.

No columns can be removed from either (EmployeeID) or (Email) while still maintaining uniqueness (each is just one column), so both are valid candidate keys.

Composite Key

A Composite Key is a key that is made up of two or more columns (attributes) in a table, and that combination uniquely identifies each row in the table. The reason for using multiple columns together is that no single column alone can guarantee uniqueness. We often encounter the need for composite keys in associative (or junction) tables, especially when we have many-to-many relationships between entities.

One common scenario is when you have a table for orders, and you want to track individual items in each order. If each order can have multiple items, and each item can appear in multiple orders, sometimes neither OrderID nor ItemID by themselves are unique. However, the combination of OrderID and ItemID together might be unique. This is precisely where a composite key shines: it merges multiple columns into a single identifying key.

Consider an OrderItems table:

OrderIDProductIDQuantityUnitPrice
1001200A215.99
1001201B125.50
1002200A315.99
1002202C210.00

If an order can have multiple products, and a product can appear on multiple orders, neither OrderID nor ProductID alone is unique. However, (OrderID, ProductID) together is a composite key that uniquely identifies each row in OrderItems.

Compound Key

The term Compound Key is sometimes used interchangeably with Composite Key, but in some database design literature, people draw a subtle difference. A compound key is often considered a key that is made up of more than one attribute, much like a composite key, but each attribute that makes up the compound key might also be a candidate key in its own right in a different context. This distinction isn’t always taught or widely enforced, so in many practical scenarios, you’ll hear “composite key” used for any multi-attribute key. However, if you do come across the term “compound key,” just recognize it’s another term describing a key formed from multiple attributes, sometimes implying these attributes have significance individually as well.

In simpler terms: whether you hear “composite key” or “compound key,” it generally indicates that you have at least two columns working together to identify rows. The difference (if recognized at all) can be that the columns in a compound key might each be keys or are at least very significant individually, whereas the columns in a composite key might only be unique when combined. Don’t worry if the terms seem to blend together—many professionals treat them as synonyms.

A good illustration is a StudentCourses table that records which students are enrolled in which courses:

StudentIDCourseIDEnrollmentDate
S001C1002025-01-15
S001C2002025-02-01
S002C1002025-01-20
S003C3002025-02-10

Here, (StudentID, CourseID) might be called a compound key if, for example, StudentID is a primary key in a Students table and CourseID is a primary key in a Courses table. You could still label it a composite key, but some designers prefer the term “compound key” to highlight the significance of each column in other places. Either way, it’s a multi-column key.

Primary Key

A Primary Key is the main key for a table, chosen from among its candidate keys, to represent each record uniquely. Think of it as the “official ID” for every row. Typically, we pick a primary key based on criteria like simplicity, ease of maintenance, and clarity. Sometimes organizations create a dedicated surrogate key, like an automatically incrementing integer (in many relational database systems, it’s called an auto-increment or identity column) to serve as the primary key. This approach ensures that each record always has an ID, even if its other columns might change over time or could accidentally be duplicated.

The primary key also helps ensure that the database can quickly locate a specific record without confusion. When we talk about “referential integrity,” we often refer to the primary key in one table being referenced by a foreign key in another table, keeping data neatly linked together. This helps prevent orphan records or mismatched relationships, as each reference points to an existing, valid primary key.

Here’s our Employees table with EmployeeID chosen as the primary key:

EmployeeIDFirstNameLastNameEmailPhoneNumber
101JohnSmith[email protected]555-1234
102SarahJohnson[email protected]555-5678
103DavidLee[email protected]555-7890

Because EmployeeID is set as the primary key, the database will ensure that no two rows share the same ID, and it won’t allow null values in that column.

Foreign Key

A Foreign Key is a column or group of columns in a table that points to a primary key (or sometimes a candidate key) in another table. This creates a relationship between the two tables. The main job of a foreign key is to maintain referential integrity, ensuring that if one table claims a certain record exists in another table, that record truly does exist there. For instance, if you say an order belongs to a particular customer, the customer ID in the Orders table must match an existing customer ID in the Customers table.

Foreign keys keep your database consistent. Without them, you might have “dangling references,” where an order points to a customer who has been deleted or never existed. By using foreign keys, the database can enforce the rules that say, “You can’t have this order unless the customer exists in our system,” or “You can’t delete this customer while they still have outstanding orders,” depending on the constraints you set (like ON DELETE CASCADE, ON DELETE NO ACTION, etc.).

Consider a Departments table and an Employees table. The Departments table might look like this:

DepartmentIDDepartmentName
1Human Resources
2Finance
3IT

And the Employees table:

EmployeeIDFirstNameLastNameDepartmentID
101JohnSmith1
102SarahJohnson3
103DavidLee2

In this design, DepartmentID in Employees is a foreign key referencing DepartmentID in Departments. If you tried inserting an employee with a DepartmentID that doesn’t appear in the Departments table, the database would reject it (unless you disabled or bypassed the constraint). This relationship also means you can’t just delete a row from Departments if employees are tied to it—unless you handle it through a cascade or some other approach.

Alternate Key

An Alternate Key is basically any candidate key that was not chosen to be the primary key. We saw earlier that a table can have multiple candidate keys, but only one gets the official title of “primary key.” The others still remain valid unique identifiers; we just call them “alternate keys.” They can come in handy for queries, indexing, or references. The existence of alternate keys does not undermine their importance; it just means they aren’t the main designated primary key.

If, for example, you are confident that an email address or a phone number is always unique, then that column (or a combination of columns) might be considered a candidate key. If you picked something else—like an auto-incrementing EmployeeID—as your primary key, that email or phone column would still be a perfectly good alternate key. You can even apply a unique constraint on it so that nobody can insert duplicate values.

Returning to the Employees table, let’s say EmployeeID is our official primary key. However, Email is also guaranteed to be unique. That makes Email a candidate key. Because we’ve already chosen EmployeeID as the primary key, Email is labeled as an alternate key. Here’s the table in markdown form:

EmployeeIDFirstNameLastNameEmailPhoneNumber
101JohnSmith[email protected]555-1234
102SarahJohnson[email protected]555-5678
103DavidLee[email protected]555-7890

If we put a UNIQUE constraint on Email, then it maintains uniqueness while still being considered an alternate key.

© 2025 Programmer Codex

GitHub