Data Warehouse Design Techniques – Derived Schemas

Loading Hierarchical Bridge Tables
May 24, 2017
Data Warehouse Design Techniques – Aggregates
July 5, 2017
Getting the correct answer is one of the most important requirements of a data warehouse, but this is only part of the requirement. The full requirement is to provide the correct information, to the user at the right time. The information is no good to the user if they need the answer today but you need to write a custom report which will take a week to develop, test and deploy. Although you will be able to provide the correct answer, it will be late and therefore of limited to no value to the decision-making process. How can we address this issues of flexibility and timeliness while building our data warehouse? In this blog article, I will demonstrate how derived schemas can address these challenges.

Snapshot Fact & Accumulating Snapshot Fact Tables
We have previously discussed these data structures in my previous blog articles titled “Snapshot Fact Tables” and “Accumulating Snapshot Fact Tables”. In these articles, I showed how we improve performance by moving the aggregation and computation of the metrics from the query to the ETL. This change of where the compute takes place and the storage of this information in these fact tables is the cost we pay for the improved performance and enhanced user experience.

Merged Fact Tables
The Merged Fact is the most common derived schema object we encounter. This solution combines the measures from multiple fact tables which share a similar grain. This new consolidated fact table will allow the user to quickly and efficiently “drill across” these attributes since they are stored in the same data structure.
An example of this can be seen below. Here I am looking at two fact tables, one contains the actual sales transactions and the other contains the monthly sales goals, along with their corresponding dimension tables.

If my customer wants to report on their sales month over month against their target I need to sum up the daily transactions to a monthly level from the FACT_SALE table for the timeframe in question and then compare that against the data contained in the FACT_SALE_GOAL table. If I want to see a different but overlapping timeframe I will need to recalculate those monthly sales by summing up the daily sales by month for each month I want to display even though the number will not change after the month has completed. Doing this at runtime is resource intensive and will cause delays in the delivery of the report to your customer.

A Merged Fact would eliminate this real-time calculation of data by creating a fact table which stored the monthly sales data for each product as a single record and the monthly sales goal metrics (quantity and sales amount) as attributes of the fact table.

Pivoted Fact Tables
The Pivoted Fact table is one in which we “pivot” from row based to column based data. An example of this can be seen in the example above when considering returns (sale type). Here there will be a negative quantity and a negative sale amount. One can reasonably expect that a user will want to know the net sales per month. To do this we can pivot the fact table on the sale type resulting in the fact table below.

Now that these metrics are on the same row we can easily do simple math the calculate the total quantity and amount sold by month.

Sliced Fact Tables
The Sliced Fact table is one where you either derive a subset of data from the “parent” fact table or you create several subordinate facts which can be unioned together to make the “parent” fact table. For instance, a company has 2 major regions, East and West, and wishes for each of the organizational units in these regions to only be able to see and update the data in their region. Corporate will have a master view of this data and can provide the regional data to the organizational units via views or physical tables limiting the data to a single region.

Set Operation Fact Tables
The Set Operation Fact table creates a derived fact table with precomputed results based on a set operation (union, intersection, minus) or the use of a subquery or correlated subquery from two or more star inputs. For example, let’s assume that we have a membership society which also runs a publishing division. The Director of the Membership Division wants to know who the authors who have published articles and are not members of the organization. Here we would need to use a set operation fact table (FACT_POTENTIAL_MEMBER) to hold the information derived from subtracting (minus) the membership fact data from the article author fact table.

1 Comment

  1. Muhammad Shahzad says:

    Nice post JIM…

Leave a Reply

Your email address will not be published. Required fields are marked *