There are actually 8 slowly changing dimension table types in dimensional modeling. Most people only focus on the 3 main slowly changing dimension types (aka SCD’s). Being asked provide the primary SCD types usually referred to as type 1, type 2 and type 3 is a common interview question when it comes to data warehousing.
It’s also crucial to understand these table type differences when designing a data warehouse. Understanding the differences will allow us to ask the right questions prior to building these tables so that future reporting and analytic needs can be met.
Here’s the quick rundown on the 3 main types of SCD’s:
- Overwrites existing attributes to maintain only current data.
- Add new rows to maintain history.
- Add new columns to maintain current and prior values.
Type 1 – Only the latest and greatest
The most common of all dimensions is Type 1 and only offers the most current data in a table based on it’s primary key. We can look at the following example.
Let’s assume that our natural key (aka business key) is based on the FirstName and LastName fields. If Frank Harris decides to move to another company – that would reflect in the current record with the CustomerKey of 16. As an example, let’s say he gets a new job from Google to Facebook.
The same record is maintained and the attributes for Frank’s company get updated to his new job.
Type 2 – Adding rows to maintain historical data
Let’s use our first example again, but since we are trying to maintain the history we need to add a start date and an end date. These two columns are common additions in a type 2 slowly changing dimension.
Now that we are able to track the record based on start and end dates, let’s see what happens when Frank leaves Google and takes the job at Facebook.
We can see that a new record was created for Frank’s new employer. Since Frank still works there, there’s no end date yet. If Frank moved companies again, we’d see another record get added and the end date would get populated for his employment at Facebook.
Type 3 – Adding columns for current and previous records
Last but not least, type 3, which is the least used of the 3 types. In this dimension type, we only care about a very small set of historical data – specifically the previous record. Let’s continue using Frank as an example.
Instead of adding rows like a type 2 SCD, we add columns to maintain the current and previous records. In this case, we don’t maintain a full history of the data – just a limited slice.
Knowing and understanding the different types of SCD’s is critical in making sure that a data warehouse has been designed appropriately. The last thing we want is to have a Type 1 dimension when we needed a Type 2 or a Type 3 since that means we’d be losing data. In a growing data-centric economy, it’s becoming more and more important to maintain historical data for analytics and machine learning so be sure to design appropriately.