Overview
A Slowly Changing Dimension (SCD) is a dimension in the data warehouse where current and historical data
that changes over time is stored. This is considered as one of the most critical tasks in an Extract,
Transform, and Load (ETL).
The backbone of any data analytics is a robust, stable, and scalable dimensional data model. Often data
models are faced with challenges when it comes to transitional dimensions.
We will discuss and find a solution on how a SCD Type-2 can be maneuvered in Qlik.
What is Slowly Changing Dimension Type-2
Before proceeding, it is crucial to clearly understand what a SCD Type-2 entails. Effectively managing this
slowly changing dimension is pivotal to ensuring accurate reporting. To illustrate, let us consider the case
of an employee who undergoes frequent role changes within an organization. Whenever an employee assumes a
new role, a record in the employee dimension includes relevant information such as their Employee Number,
new Title, Start Date, End Date, and Active Flag. Typically, the Start Date for the new role will be one day
after the End Date of the previous role, while the End Date for the new role is either left null or set as a
future default date, such as 31-Dec-9999. In this context, we want to obtain the Point-in-time Title of an
employee who attended a committee meeting on a specific Committee Date.
Such a scenario can be easily handled in SQL using the between operator, where the query will look something
like the below to fetch the current role of an Employee-
Select EmployeeNo, FullName, Title From dbo.DimEmployee where EmployeeNo = 12345 and
getdate() between EffectiveDate and [End Date];
When scripting in Qlik, between function is not available and that is when we use IntervalMatch().
The IntervalMatch() function is used to match a particular value in one table – it could be a
numerical
number or a date, to a range in another table.
Let us now look at the script.
- We first load the fact table.
- Loading the DimEmployee table which contains the Title which is a SCD.
- In the script below, we are using the extended syntax for IntervalMatch() where there are two parameters
passed:
- CommitteeDate, the match field, is the field, we are comparing against the date period when the
Employee’s Title was effective.
- Cmtt_EmpNo, the Key field. Since, in this scenario, we are looking at the Title effective period
for an
Employee, we must pass the EmpNo as well for join. Where this is used is something that will be
shown in
scrips to follow.
At this point the table, bridge table – - IntervalMatch has the data for Employee Title Effective
date and
End date whenever the employee attended a committee meeting.
- Finally, we join the bridge table with the fact, where the EmployeeNo and the period field will involve
in the
join to get the expected data.
Wrapping Up
In conclusion, the proper handling of a Slowly Changing Dimension (SCD) is critical to accurate and reliable
data
modeling. Without proper management of SCDs, stakeholders may encounter inaccuracies and inconsistencies
when
trying to analyze data, leading to suboptimal decision-making. According to a survey conducted by Experian
Data
Quality, 77% of companies believe that inaccurate data is a key challenge to their business operations. In
addition, a study by Gartner found that poor data quality costs organizations an average of $15 million per
year
in losses. Therefore, it is essential to implement best practices for managing SCDs, such as tracking
changes over
time, properly categorizing different types of changes, and establishing clear protocols for updating and
maintaining data. By prioritizing the correct handling of SCDs, organizations can ensure that their data is
reliable, accurate, and actionable, ultimately leading to improved business outcomes.
Let our expert Qlik Consulting
Services guide you through implementing Slowly Changing Dimension. Discover how to turn your data
into a
strategic asset with our innovative solutions.