Data Warehouse Design Techniques – Simple Hierarchical Dimensions

Data Warehouse Design Techniques – Snowflaking
February 8, 2017
Data Warehouse Design Techniques – Ragged Hierarchical Dimensions
March 8, 2017
Today, continuing through our look at the advanced modeling techniques of dimensional data warehouses, we will take a look at simple hierarchical dimensions.

Simple Hierarchical Dimensions

Hierarchical dimensions are those dimensions which have a parent/child relationship. In simple hierarchies there every child has a parent at the level above with no skipping of levels. A date (day, month, quarter, year) is the most common example of a simple hierarchical dimension and one that is used by all dimensional data warehouses.

The question for the data modeler is how to model the hierarchy? Let’s use the date dimension for some initial examples. The first and most common way to model the date hierarchy is to flatten the date into a single dimension where you have every piece of information you need concerning the date in a single location.

In this flattened dimensional hierarchy, you can see all of the parent, grandparent, great-grandparent, etc. information of the date in a single table. The selecting of a date will provide you with the detailed information about the date (day of week name, if the day is a holiday, etc.), as well as the information about the week, month, quarter and year of the calendar year as well as that of the fiscal year. One simple select will provide you with a large amount of detail.

 

Snowflaking Hierarchical Dimensions

A different way to model this information would be to snowflake the dimension. As can be seen below.

 

Here you will obtain the same information but, as I pointed out in a previous post, it will take you several additional joins to obtain the same information you could obtain from a single join when using a totally flat hierarchical dimension.

The snowflake is often used to address the challenge of having dimensions which need to use the date at different parts of the hierarchy. For instance, I want to capture the measures at a monthly or quarterly level, not at a specific date. To address this issue without snowflaking one would simply create new dimensions with the unique data from that point forward. I have modeled the DIM_MONTH and DIM_QUARTER dimensions below.

 

One would use these dimensions as needed, always using the dimension with the most detailed information for the grain of the fact.

Other examples of simple hierarchical dimensions can be seen using a store dimension. A store can be in a district and a group of districts can encompass a region.

Next week I will discuss advanced dimensional design techniques for ragged hierarchies, hierarchies which can skip one or more levels.

6 Comments

  1. Hey Jim, i haven’t seen this much detailed datetime dimension so far….Does they really exist?

  2. Muhammad Shahzad says:

    Jim, Can you share your query?

    • Jim McHugh says:

      DROP TABLE DIM_DATE PURGE;
      CREATE TABLE DIM_DATE
      (DATE_KEY NUMBER
      ,DATE_FULL_NUMBER VARCHAR2(20)
      ,DATE_FULL_STRING VARCHAR2(50)
      ,DATE_WEEKDAY_FL NUMBER
      ,DATE_US_CIVIL_HOLIDAY_FL NUMBER
      ,DATE_LAST_DAY_OF_WEEK_FL NUMBER
      ,DATE_LAST_DAY_OF_MONTH_FL NUMBER
      ,DATE_LAST_DAY_OF_QTR_FL NUMBER
      ,DATE_LAST_DAY_OF_YR_FL NUMBER
      ,DATE_LAST_DAY_OF_FSCL_QTR_FL NUMBER
      ,DATE_LAST_DAY_OF_FSCL_YR_FL NUMBER
      ,DATE_DAY_OF_WEEK_NAME VARCHAR2(20)
      ,DATE_DAY_OF_WEEK_ABBR VARCHAR2(20)
      ,DATE_MONTH_NAME VARCHAR2(20)
      ,DATE_MONTH_NAME_ABBR VARCHAR2(20)
      ,DATE_DAY_NUMBER_OF_WEEK NUMBER
      ,DATE_DAY_NUMBER_OF_MONTH NUMBER
      ,DATE_DAY_NUMBER_OF_QTR NUMBER
      ,DATE_DAY_NUMBER_OF_YR NUMBER
      ,DATE_DAY_NUMBER_OF_FSCL_QTR NUMBER
      ,DATE_DAY_NUMBER_OF_FSCL_YR NUMBER
      ,DATE_WEEK_NUMBER_OF_MONTH NUMBER
      ,DATE_WEEK_NUMBER_OF_QTR NUMBER
      ,DATE_WEEK_NUMBER_OF_YR NUMBER
      ,DATE_WEEK_NUMBER_OF_FSCL_QTR NUMBER
      ,DATE_WEEK_NUMBER_OF_FSCL_YR NUMBER
      ,DATE_MONTH_NUMBER_OF_YR NUMBER
      ,DATE_MONTH_NUMBER_OF_FSCL_YR NUMBER
      ,DATE_QTR_NUMBER_OF_YR NUMBER
      ,DATE_QTR_NUMBER_OF_FSCL_YR NUMBER
      ,DATE_YEAR_NUMBER NUMBER
      ,DATE_WEEK_BEGIN_DT DATE
      ,DATE_WEEK_END_DT DATE
      ,DATE_MONTH_BEGIN_DT DATE
      ,DATE_MONTH_END_DT DATE
      ,DATE_QTR_BEGIN_DT DATE
      ,DATE_QTR_END_DT DATE
      ,DATE_YR_BEGIN_DT DATE
      ,DATE_YR_END_DT DATE
      ,DATE_FSCL_QTR_BEGIN_DT DATE
      ,DATE_FSCL_QTR_END_DT DATE
      ,DATE_FSCL_YR_BEGIN_DT DATE
      ,DATE_FSCL_YR_END_DT DATE
      ,DATE_CREATE_DT DATE
      ,CONSTRAINT PK_DIM_DATE PRIMARY KEY (DATE_KEY)
      );
      –DROP INDEX IDX_DIM_DATE_01;
      CREATE INDEX IDX_DIM_DATE_01 ON DIM_DATE(SUBSTR(DATE_KEY,5,4)) NOLOGGING COMPUTE STATISTICS;

      INSERT INTO DIM_DATE
      select to_number(to_char(mydate, ‘yyyymmdd’)) DATE_KEY
      ,to_char(mydate, ‘yyyy-mm-dd’) DATE_FULL_NUMBER
      ,to_char(mydate, ‘fmMonth dd, yyyy’) DATE_FULL_STRING
      ,case
      when to_number(to_char(mydate, ‘D’)) in (1,7)
      then 0
      else 1
      end DATE_WEEKDAY_FL
      ,0 DATE_US_CIVIL_HOLIDAY_FL
      ,case
      when to_char(mydate, ‘fmDay’) = ‘Saturday’
      then 1
      else 0
      end DATE_LAST_DAY_OF_WEEK_FL
      ,case
      when mydate = last_day(trunc(mydate,’MM’))
      then 1
      else 0
      end DATE_LAST_DAY_OF_MONTH_FL
      ,case to_number(to_char(mydate, ‘mmdd’))
      when 331 then 1
      when 630 then 1
      when 930 then 1
      when 1231 then 1
      else 0
      end DATE_LAST_DAY_OF_QTR_FL
      ,decode(to_number(to_char(mydate, ‘mmdd’)),1231,1,0) DATE_LAST_DAY_OF_YR_FL
      ,case to_number(to_char(mydate, ‘mmdd’))
      when 331 then 1
      when 630 then 1
      when 930 then 1
      when 1231 then 1
      else 0
      end DATE_LAST_DAY_OF_FSCL_QTR_FL
      ,decode(to_number(to_char(mydate, ‘mmdd’)),930,1,0) DATE_LAST_DAY_OF_FSCL_YR_FL
      ,to_char(mydate, ‘fmDay’) DATE_DAY_OF_WEEK_NAME
      ,to_char(mydate, ‘fmDy’) DATE_DAY_OF_WEEK_ABBR
      ,to_char(mydate,’fmMonth’) DATE_MONTH_NAME
      ,to_char(mydate,’Mon’) DATE_MONTH_NAME_ABBR
      ,to_number(to_char(mydate, ‘D’)) DATE_DAY_NUMBER_OF_WEEK
      ,to_number(to_char(mydate, ‘DD’)) DATE_DAY_NUMBER_OF_MONTH
      ,trunc(mydate) – trunc(mydate,’Q’) + 1 DATE_DAY_NUMBER_OF_QTR
      ,to_number(to_char(mydate,’ddd’)) DATE_DAY_NUMBER_OF_YR
      ,trunc(mydate) – trunc(mydate,’Q’) + 1 DATE_DAY_NUMBER_OF_FSCL_QTR
      ,case
      when to_number(to_char(mydate, ‘mmdd’)) > 930
      then to_number(to_char(mydate,’ddd’)) –
      to_number(to_char(to_date(to_char(mydate,’yyyy’)||’0930′,’yyyymmdd’),’ddd’))
      else to_number(to_char(mydate,’ddd’)) + 92
      end DATE_DAY_NUMBER_OF_FSCL_YR
      ,to_number(to_char(mydate,’W’)) DATE_WEEK_NUMBER_OF_MONTH
      ,(7 + TRUNC(mydate + 1,’IW’) –
      TRUNC(TRUNC(mydate,’Q’)+1,’IW’))/7 DATE_WEEK_NUMBER_OF_QTR
      ,case
      when TO_CHAR(mydate,’D’) < TO_CHAR(TO_DATE(to_char(mydate,'yyyy')||'0101' ,'YYYYMMDD'),'D') THEN TO_CHAR(mydate,'WW') + 1 ELSE TO_CHAR(mydate,'WW') + 0 end DATE_WEEK_NUMBER_OF_YR ,(7 + TRUNC(mydate + 1,'IW') - TRUNC(TRUNC(mydate,'Q')+1,'IW'))/7 DATE_WEEK_NUMBER_OF_FSCL_QTR ,case when to_number(to_char(mydate, 'mmdd')) >= 1001
      then CASE trim(to_char(TO_DATE(to_number(to_char(mydate,’yyyy’))||’1001′,’YYYYMMDD’),’DAY’))
      WHEN ‘SUNDAY’
      THEN trunc((to_number(to_char(mydate,’ddd’)) – to_number(to_char(to_date(to_char(mydate,’yyyy’)||’1001′,’yyyymmdd’),’ddd’)))/7) + 1
      WHEN ‘MONDAY’
      THEN trunc((to_number(to_char(mydate,’ddd’)) – to_number(to_char(to_date(to_char(mydate,’yyyy’)||’1001′,’yyyymmdd’),’ddd’))+1)/7) +1
      WHEN ‘TUESDAY’
      THEN trunc((to_number(to_char(mydate,’ddd’)) – to_number(to_char(to_date(to_char(mydate,’yyyy’)||’1001′,’yyyymmdd’),’ddd’))+2)/7) +1
      WHEN ‘WEDNESDAY’
      THEN trunc((to_number(to_char(mydate,’ddd’)) – to_number(to_char(to_date(to_char(mydate,’yyyy’)||’1001′,’yyyymmdd’),’ddd’))+3)/7) +1
      WHEN ‘THURSDAY’
      THEN trunc((to_number(to_char(mydate,’ddd’)) – to_number(to_char(to_date(to_char(mydate,’yyyy’)||’1001′,’yyyymmdd’),’ddd’))+4)/7) +1
      WHEN ‘FRIDAY’
      THEN trunc((to_number(to_char(mydate,’ddd’)) – to_number(to_char(to_date(to_char(mydate,’yyyy’)||’1001′,’yyyymmdd’),’ddd’))+5)/7) +1
      WHEN ‘SATURDAY’
      THEN trunc((to_number(to_char(mydate,’ddd’)) – to_number(to_char(to_date(to_char(mydate,’yyyy’)||’1001′,’yyyymmdd’),’ddd’))+6)/7) +1
      ELSE 0
      END
      else CASE trim(to_char(TO_DATE(to_number(to_char(mydate,’yyyy’)-1)||’1001′,’YYYYMMDD’),’DAY’))
      WHEN ‘SUNDAY’
      THEN trunc((91 + to_number(to_char(mydate,’ddd’)))/7) + 1
      WHEN ‘MONDAY’
      THEN trunc((92 + to_number(to_char(mydate,’ddd’)))/7) +1
      WHEN ‘TUESDAY’
      THEN trunc((93 + to_number(to_char(mydate,’ddd’)))/7) +1
      WHEN ‘WEDNESDAY’
      THEN trunc((94 + to_number(to_char(mydate,’ddd’)))/7) +1
      WHEN ‘THURSDAY’
      THEN trunc((95 + to_number(to_char(mydate,’ddd’)))/7) +1
      WHEN ‘FRIDAY’
      THEN trunc((96 + to_number(to_char(mydate,’ddd’)))/7) +1
      WHEN ‘SATURDAY’
      THEN trunc((97 + to_number(to_char(mydate,’ddd’)))/7) +1
      ELSE 0
      END
      end DATE_WEEK_NUMBER_OF_FSCL_YR
      ,to_number(to_char(mydate,’mm’)) DATE_MONTH_NUMBER_OF_YR
      ,to_number(to_char(mydate,’mm’)) DATE_MONTH_NUMBER_OF_FSCL_YR
      ,to_number(to_char(mydate,’Q’)) DATE_QTR_NUMBER_OF_YR
      ,case to_number(to_char(mydate,’Q’))
      when 1 then 2
      when 2 then 3
      when 3 then 4
      when 4 then 1
      end DATE_QTR_NUMBER_OF_FSCL_YR
      ,to_number(to_char(mydate,’YYYY’)) DATE_YEAR_NUMBER
      ,decode(to_char(mydate, ‘fmDay’),’Sunday’,trunc(mydate)
      ,next_day(trunc(mydate-7,’DD’), ‘sun’)) DATE_OF_WEEK_BEGIN_DT
      ,decode(to_char(mydate, ‘fmDay’),’Saturday’,trunc(mydate)
      ,next_day(trunc(mydate,’DD’), ‘sat’)) DATE_OF_WEEK_END_DT
      ,TO_DATE(TO_CHAR(mydate,’YYYYMM’)||’01’,’YYYYMMDD’) DATE_OF_MONTH_BEGIN_DT
      ,trunc(last_day(mydate)) DATE_OF_MONTH_END_DT
      ,case to_number(to_char(mydate,’Q’))
      when 1 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0101′,’YYYYMMDD’)
      when 2 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0401′,’YYYYMMDD’)
      when 3 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0701′,’YYYYMMDD’)
      when 4 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’1001′,’YYYYMMDD’)
      else null
      end DATE_QTR_BEGIN_DT
      ,case to_number(to_char(mydate,’Q’))
      when 1 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0331′,’YYYYMMDD’)
      when 2 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0630′,’YYYYMMDD’)
      when 3 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0930′,’YYYYMMDD’)
      when 4 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’1231′,’YYYYMMDD’)
      else null
      end DATE_QTR_END_DT
      ,TO_DATE(TO_CHAR(mydate,’YYYY’)||’0101′,’YYYYMMDD’) DATE_YEAR_BEGIN_DT
      ,TO_DATE(TO_CHAR(mydate,’YYYY’)||’1231′,’YYYYMMDD’) DATE_YEAR_END_DT
      ,case to_number(to_char(mydate,’Q’))
      when 1 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0101′,’YYYYMMDD’)
      when 2 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0401′,’YYYYMMDD’)
      when 3 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0701′,’YYYYMMDD’)
      when 4 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’1001′,’YYYYMMDD’)
      else null
      end DATE_FSCL_QTR_BEGIN_DT
      ,case to_number(to_char(mydate,’Q’))
      when 1 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0331′,’YYYYMMDD’)
      when 2 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0630′,’YYYYMMDD’)
      when 3 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’0930′,’YYYYMMDD’)
      when 4 then TO_DATE(TO_CHAR(mydate,’YYYY’)||’1231′,’YYYYMMDD’)
      else null
      end DATE_FSCL_QTR_END_DT
      ,case
      when to_number(to_char(mydate,’mm’)) >= 10
      then TO_DATE(TO_CHAR(mydate,’YYYY’)||’1001′,’YYYYMMDD’)
      else TO_DATE(to_number(TO_CHAR(mydate,’YYYY’))-1||’1001′,’YYYYMMDD’)
      end DATE_FSCL_YEAR_BEGIN_DT
      ,case
      when to_number(to_char(mydate,’mm’)) <= 10 then TO_DATE(TO_CHAR(mydate,'YYYY')||'0930','YYYYMMDD') else TO_DATE(to_number(TO_CHAR(mydate,'YYYY'))+1||'0930','YYYYMMDD') end DATE_FSCL_YEAR_END_DT ,sysdate DATE_CREATE_DT from (select trunc(add_months(sysdate, -60),'YY') - 1 + LEVEL mydate from dual connect by level <= (select trunc(add_months(sysdate,132),'YY') -trunc(add_months(sysdate,-60),'YY') from dual) order by 1); COMMIT; Update dim_date set DATE_US_CIVIL_HOLIDAY_FL = 1 where date_key in ( SELECT DATE_KEY FROM DIM_DATE WHERE DATE_KEY IN (SELECT * FROM (SELECT DISTINCT TO_NUMBER(TO_CHAR(NEXT_DAY('14-Jan-'||DATE_YEAR_NUMBER,'Monday'),'YYYYMMDD')) FROM DIM_DATE -- MLK Day UNION SELECT DISTINCT TO_NUMBER(TO_CHAR(NEXT_DAY('14-Feb-'||DATE_YEAR_NUMBER,'Monday'),'YYYYMMDD')) FROM DIM_DATE -- Washington's Birthday UNION SELECT DISTINCT TO_NUMBER(TO_CHAR(NEXT_DAY('23-May-'||DATE_YEAR_NUMBER,'Monday'),'YYYYMMDD')) FROM DIM_DATE -- Memorial Day UNION SELECT DISTINCT TO_NUMBER(TO_CHAR(NEXT_DAY('31-Aug-'||DATE_YEAR_NUMBER,'Monday'),'YYYYMMDD')) FROM DIM_DATE -- Labor Day UNION SELECT DISTINCT TO_NUMBER(TO_CHAR(NEXT_DAY( '7-Oct-'||DATE_YEAR_NUMBER,'Monday'),'YYYYMMDD')) FROM DIM_DATE -- Columbus Day UNION SELECT DISTINCT TO_NUMBER(TO_CHAR(NEXT_DAY('21-Nov-'||DATE_YEAR_NUMBER,'Thursday'),'YYYYMMDD')) FROM DIM_DATE -- Thanksgiving Day UNION SELECT DECODE(DATE_DAY_OF_WEEK_NAME,'Sunday',DATE_KEY+1,DATE_KEY) --Inauguration Day FROM DIM_DATE WHERE SUBSTR(DATE_KEY,5,4) = '0120' AND MOD(DATE_YEAR_NUMBER,4) = 1 UNION SELECT DECODE(DATE_DAY_OF_WEEK_NAME, 'Saturday',DATE_KEY-1, 'Sunday',DATE_KEY+1, DATE_KEY) FROM DIM_DATE WHERE SUBSTR(DATE_KEY,5,4) = '0101' -- New Year's Day OR SUBSTR(DATE_KEY,5,4) = '0704' -- Independence Day OR SUBSTR(DATE_KEY,5,4) = '1111' -- Veteran's Day OR SUBSTR(DATE_KEY,5,4) = '1225' -- Christmas Day ) ) ); COMMIT; SELECT * FROM DIM_DATE where sysdate between DATE_WEEK_BEGIN_DT and DATE_WEEK_END_DT order by date_key; SELECT * FROM DIM_DATE WHERE DATE_YEAR_NUMBER = 2017 AND DATE_US_CIVIL_HOLIDAY_FL = 1 ORDER BY DATE_KEY; SELECT * FROM DIM_DATE where trunc(sysdate) between DATE_WEEK_BEGIN_DT and DATE_WEEK_END_DT; select sysdate from dual;

  3. Daniele says:

    Dear Jim,

    first of all let me tell you I really like your articles.
    As I’m for the first time trying to create my first dw, I’d like to ask you a question.

    You wrote: “The snowflake is often used to address the challenge of having dimensions which need to use the date at different parts of the hierarchy. For instance, I want to capture the measures at a monthly or quarterly level, not at a specific date. “.

    I can’t get what you mean by ” I want to capture the measures at a monthly or quarterly level, not at a specific date. “; are you saying that a flattened dimension would give me troubles in doing such type of analysis?

    • Jim McHugh says:

      Daniele, What I am saying here is that you can use a single, flattened hierarchical dimension to address multiple variations of a single metric. For instance, If I wanted to know how many widgets I sold each day, I can use a DIM_DATE table that only contains information about days. If I wanted to know about monthly sales, quarterly sales or yearly sales I would have to connect separate dimensions (DIM_MONTH, DIM_QUARTER, DIM_YEAR) to address these reporting needs. If we flatten this hierarchy, where a date contains month, quarter and year information, we can answer this question with no additional joins to other dimensions. Therefore a flattened dimension provides you with far more flexibility when reporting.