Updating / Replacing Fact Records

Hierarchies in Qlik Sense
October 18, 2019
Leveraging Athena with KNIME in a Robust Manner, Part One
October 24, 2019
Show all

You will inevitably come across a time when a source system will allow the updating of transaction records. This drives data modelers, ETL developers, analytics designers, and analytics users crazy. After all the source system stated that this information was a “fact”, leading all to believe it would not change, now you must design for and resolve the changing fact data. In this blog post we will discuss the options and the impact on these decisions will have on your warehouse and BI solution.

Let’s start with the easy and move to the complex. The easiest thing to do when receiving data causing an existing fact record to be updated is to simply update the measures in the existing record. Some people will argue that “you cannot just update the fact record. I have already had people use the data and now they will see a change that cannot readily be explained since the measure was overwritten.” Although this statement is true, most data warehouse project owners, sponsors and users understand that there can be some updates in the measures and will accept some variance without needing to see the individual transaction(s) which caused the change. Here the traditional star schema fact (see below) will address our most common modifications and changes to the fact measures.

Figure 1 – Traditional Star Schema Fact

For example, a person comes back to the store with their receipt and a coupon for a 20% discount. The easiest thing to do is to add the discount percentage to the existing fact record and adjust the total sale amount.

Next, we could capture the change as a modification to the existing record. An example of this would be a return of a purchased item. In this situation you would not update the qty sold but rather add a new transaction showing the return of the item with a negative quantity and sale amount. When summing the transaction fact you will get the correct number of items “sold” for the period you are researching. This is modeled the same way we model for overwriting fact records. See traditional star schema fact data model above.

Finally, I have come across some demanding customers who want to be able to recreate a report as it was on any specific day. When I come across these requirements, I capture the changes to the fact record in the same way as a Type II dimension change. These slowly changing facts contain active record start and end dates along with an active record flag for the current record for any time periods for which we are reporting upon. See example data model below.

Figure 2 – Slowly Changing Fact

This is a model that will allow for the capture and reporting of a series of measures provided by the transactional system. The system can update the records and the customer wants to be able to create the report as it was created on any specific day. Here is an example of a record which was initially inserted on 1/1, updated on 2/1, 2/7, 2/15 and 3/1. (due to space limitations and clarity in the example I am only showing the DEPT_ID instead of the DEPT_KEY)

January 1

February 1

February 7

February 15

March 1

The challenge with this model is you need to do a lot of extra work to ensure you get the correct answer. The ETL developers need to ensure that they capture the changing transaction and set all of the flags correctly and the ETL tool needs to use those flags to get the correct answers.

Now that you have the data stored in the fact, lets get the data out for reporting. To get the current state of the fact you select for the active record:

SELECT *
FROM FACT_DEPT_STATUS
WHERE ACTV_RCRD_FL = 1;

To obtain the monthly historical values of the record you use the active monthly record flags:

SELECT *
FROM FACT_DEPT_STATUS
WHERE DEPT_ID = ‘AAB234’
AND ACTV_MNTHLY_RCRD_FL = 1;

Finally, to obtain the fact as it was on February 24 you will need to run the following SQL:

SELECT *
FROM FACT_DEPT_STATUS
WHERE DEPT_ID = ‘AAB234’
AND to_date(20190224, ‘YYYMMDD’) between ACTV_RCRD_STRT_DT and ACTV_RCRD_END_DT;

As you can see custom SQL is needed to answer each question, which allows us to meet the customer’s requirements but is sub-optimal for the analytics and reporting systems.

In summary, facts can sometimes be malleable, causing challenges for designers, developers and users alike.

Jim McHugh
Jim McHugh
An accomplished Sr. IT leader with over 27 years of professional experience applying technology solutions to business challenges. Mr. McHugh has expertise in data modeling, data governance, business intelligence, predictive analytics and data science. His responsibilities include establishing and executing a strategy that ensures the application of data management & analytics to enable an organization to strategically leverage and fully realize the value of their data.

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact