Loading Hierarchical Bridge Tables

Data Warehouse Design Techniques – Accumulating Snapshot Fact Tables
April 21, 2017
Data Warehouse Design Techniques – Derived Schemas
June 16, 2017

This blog article is a follow-up to the Ragged Hierarchical Dimensions article I posted a few weeks ago.  In this article, I spoke of using a hierarchical bridge table at a high level, today I will discuss the nuances of this method in more detail.Let’s begin at the dimension table. In order to capture the data with the appropriate level of detail you will need to design your dimension with recursive fields.I have designed this dimension table as a type 2 dimension table and I include the recursive (parent) field as an attribute. This allows me to capture the point-in-time changes of the dimensional attributes while still being able to display the hierarchy.

Most reporting tools cannot handle this recursive field, so we need to find a way to transform the recursive join into a relational join. To do this we will use recursive query as a part of the ETL process to create this relational join (bridge) table. The query will create a relational hierarchy consisting of each record in the hierarchy associated with itself with all of its subordinates regardless of level. This query will also capture the number of levels a subordinate record is from the root node of the hierarchy. Finally, we will also set a flag to indicate that the record is the top (root node) of the hierarchy.

Using the organizational chart below I would expect to receive the following recursive data from the operational application.

From this data we will create a type 2 dimensional table which will look like this:

To transform this table into a hierarchical bridge table (BR_EMP_HRCHY) we will run the following query:

SELECT connect_by_root EMP_KEY superior_key
,EMP_KEY subordinate_key
,LEVEL- 1 level_from_root
,DECODE(connect_by_root EMP_KEY,EMP_KEY,1,0) top_flag
FROM EMPLOYEE
CONNECT BY NOCYCLE PRIOR EMP_ID = EMP_MGR_ID;

This statement will create a key/value pairing of parent/child records for all records in the recursive table. The linchpin of this query is the connect_by_root command. This command will connect all records in the recursive table to all of its subordinate records, including itself. This is EXACTLY what we need to create our relational join table. The level -1 will tell us how far away from the root level the record is. The DECODE statement is used to define the top record. This flag is also important as it will be used in our queries to avoid duplicates, I will provide examples of how the columns level_from_root and top_flag are used later in this blog.

This query will create the following output:

Now that the bridge table is designed and loaded let’s look at how to use the hierarchical bridge table. This table can be used between a fact table and the employee dimension. This will allow us to easily answer the following example questions:

  • How many people work for Quinton?

To answer this question, we only need to use the DIM_EMPLOYEE and the BR_EMP_HRCHY table. First, we find all records where Quinton’s employee key is the superior key in the bridge table.

SELECT A.EMP_NAME MANAGER_NAME
,SUPERIOR_KEY SUPERIOR_KEY
,SUBORDINATE_KEY SUBORDINATE_KEY
,B.EMP_NAME EMPLOYEE_NAME
,LEVEL_FROM_ROOT LEVEL_FROM_ROOT
,TOP_FLAG TOP_FLAG
FROM BR_EMP_HRCHY
JOIN DIM_EMPLOYEE A ON A.EMP_KEY = SUPERIOR_KEY
JOIN DIM_EMPLOYEE B ON B.EMP_KEY = SUBORDINATE_KEY
WHERE SUPERIOR_KEY IN (SELECT EMP_KEY
FROM DIM_EMPLOYEE
WHERE EMP_NAME = ‘Quinton’);

As you can see this will produce a result of 9 records, which is incorrect. There are only 8 people who work for Quinton. This result includes Quinton, which in some cases is necessary but in this instance, it is not. We need to filter out Quinton from the query. Fortunately, we have a top flag which lets us know that this is the root record of the tree. As you can see Quinton’s is joined to himself here, his superior and subordinate keys are the same. Therefore, we can use this flag to filter out this result and get the correct answer.

SELECT A.EMP_NAME MANAGER_NAME
,SUPERIOR_KEY SUPERIOR_KEY
,SUBORDINATE_KEY SUBORDINATE_KEY
,B.EMP_NAME EMPLOYEE_NAME
,LEVEL_FROM_ROOT LEVEL_FROM_ROOT
,TOP_FLAG TOP_FLAG
FROM BR_EMP_HRCHY
JOIN DIM_EMPLOYEE A ON A.EMP_KEY = SUPERIOR_KEY
JOIN DIM_EMPLOYEE B ON B.EMP_KEY = SUBORDINATE_KEY
WHERE SUPERIOR_KEY IN (SELECT EMP_KEY
FROM DIM_EMPLOYEE
WHERE EMP_NAME = ‘Quinton’)

AND TOP_FLAG = 0;

Now the answer is correct!

Let’s try the second example. We have the following Payroll Fact table in our warehouse.

  • Calculate the total salaries of the employees who report to Renaldo for pay period 6 and 7.

SELECT SUM(C.NET_PAY) NET_PAY
FROM BR_EMP_HRCHY
JOIN FACT_PAYROLL C ON C.EMP_KEY = SUBORDINATE_KEY
WHERE SUPERIOR_KEY IN (SELECT EMP_KEY
FROM DIM_EMPLOYEE
WHERE EMP_NAME = ‘Renaldo’)
AND TOP_FLAG = 0 — don’t forget to remove Renaldo from the result
AND PAY_PERIOD_KEY in (201706, 201707);

  • Show me the hierarchy of management to whom Renaldo reports.

SELECT *
FROM BR_EMP_HRCHY
JOIN DIM_EMPLOYEE E ON E.EMP_KEY = SUPERIOR_KEY
WHERE SUBORDINATE_KEY IN (SELECT EMP_KEY
FROM DIM_EMPLOYEE
WHERE EMP_NAME = ‘Renaldo’)
ORDER BY LEVEL_FROM_ROOT;

 

As you can see the hierarchical bridge table is an easy and powerful way to transform a recursive query into a relational query that most analytical tools can use to quickly and efficiently calculate hierarchical metrics. I hope you found this informative and can use this sometime soon on your data warehousing / analytics project.

4 Comments

  1. Thanks a lot for this article and it is useful in my project implementation.

  2. broderie says:

    Very nice article, just what I needed.

  3. Muhammad Shahzad says:

    Awesome.

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact