Understanding a Primary Key with SQL

We use primary keys on most tables while utilizing the SQL language in an OLTP or OLAP data model. These keys act as a unique identifier for a single table. Its typically defined as an integer data type and values are almost always incremental (ex. 1,2,3, and so on).

But why do we need them?

These keys are needed since they are primarily used as a pointing reference to another table. As a result, a pointer to a primary key on another table is referred to as a foreign key. The image below shows how a primary key is used as a reference (foreign key) to another table.

primary key example in an rdbms database
Primary/Foreign Key Relationship

Primary keys are almost always indexed fields. In fact, most databases will index them by default. As a result, the database optimizes and prioritizes these columns for performance reasons. This is because these keys are references as foreign keys to other tables. We can see in the above example how the customer data gets tied to the invoice data through the reference of this key.

Uniqueness & Enforced Integrity

These keys are also primarily used to enforce integrity of a table meaning there can be only one primary key value per table. These values will always be unique because they have enforced uniqueness. If we were to try and insert a duplicate value then we would see an error similar to the one below.

sqlite primary key constraint error
Primary Key Constraint Error in SQLite

On top of being unique, they cannot be NULL or empty. Even though Primary Keys are unique they should not be confused with a unique key. Unique keys will allow NULL values to exist.

All in all, the important things to remember with primary keys are:

  1. Always NOT NULL
  2. They must be UNIQUE
  3. Indexed by the database (usually by default)
  4. Used as a foreign key on another table (usually)
  5. Used as a point of record
Final Thoughts

At the end of the day, these keys are a crucial piece of any relational database table. As a result, it’s important to understand these keys so we can write performant queries and create appropriate and optimized data models.