Menu Close

Data Warehouse Design Techniques – Rapidly Changing Dimensions

In my last blog post, I showed how to capture changes for slowly changing dimensions. Today we will see how to handle rapidly changing dimensions.Rapidly Changing Dimensions

Rapidly changing dimensions are dimensions where the attribute values of the dimension change frequently causing the dimension grow rapidly if you a have designed the dimension to capture the changes as a Type 2 dimension. The rapid growth of this dimension will impact maintenance and performance as the dimension grows. Let’s take a look at solutions to this challenge.

Mini-Dimension (SCD Type 4)

Mini-dimensions contain the rapidly changing attributes of the original dimension and are treated as a stand-alone dimension. This dimension will band the attributes together instead of treating them as individual pieces of data. The fact table can then connect to the mini-dimension to based upon the data provided or this mini-dimension could be connected to the “parent” dimension using the surrogate key as a Type 1 or Type 2 SCD attribute.

An example of this can be seen in a customer table where the client wishes to store information which may change at least once per year and in some cases many times per week. In the table below we see that the client wishes to capture the following rapidly changing attributes: customer’s age, income, the number of lifetime purchases, rating, account status, and credit score.

 

In order to limit the growth of this table, we can remove these rapidly changing attributes and move them to a table called DIM_CUSTOMER_ATTRIBUTE.

Since there are thousands of permutations of these attributes it is recommended that you band the attributes. For example, we may want to band age in groups of 10 years, income in bands of $10,000, and lifetime purchases and credit score in bands of 25 while providing a single value for the other attributes since they are limited in number. Here is a partial example of the data which may be stored in the mini-dimension table.

Junk Dimension

If we only cared about the customer status (Collect, Current, Payment – On-Time, Payment – Late) from the above example a Junk Dimension might be a better solution. A junk dimension is a dimension containing one or more unrelated attributes and is used to avoid having a large number of foreign keys in the fact table.

Conclusion

As you can see there are multiple solutions to handle the challenge of a rapidly changing dimension. The most important thing to remember is to take your time, profile your data. This will help ensure that your model is flexible and scalable for future growth and to meet the changing needs of the business.

Posted in Blog

4 Comments

  1. Foo

    “or this mini-dimension could be connected to the “parent” dimension using the surrogate key as a Type 1 or Type 2 SCD attribute”

    From Data Warehouse Toolkit 3rd edition, Chapter 5 pg. 160

    “You wouldn’t want to track this attribute as a type 2 because then you’d be capturing volatile changes within the large multimillion-row dimension and avoiding this explosive growth was one of the original motivations for type 4.”

    • Jim McHugh

      Foo, Although the statement that you point out from the Data Warehouse Tool Kit is a factual statement I think that you are misunderstanding the design options I have stated in this post. In the example provided I point out that I can create a mini-dimension (DIM_CUSTOMER_ATTRIBUTE) that contains all of the permutations of the rapidly changing customer attributes. I then state that it is preferred that these attributes be banded to eliminate thousands of records. Now, this mini-dimension can then be attached as a snowflake set of attributes to the main dimension or to the fact as a standard dimension. The book you cite states that attaching it directly to the fact table is preferred, but I can see instances where I may want that mini-dimension to be attached to the DIM_CUSTOMER dimension directly using a Type 1 or Type 2 SCD. All I am stating is that there are design options that you, as a data architect, can take to address your customer’s challenges.

  2. Foo

    I don’t think I’m misunderstanding as you’ve restated the misapplication here in the comments again.

    I’m focused on treating the mini-dimension key in DIM_CUSTOMER as a Type 2 SCD. What you’re describing is a Type 5 approach which is characterized by using a Mini-Dimension with a Type 1 Outrigger in the ‘parent’ dimension.

    To treat the mini-dimension key in the parent dim as a Type 2 attribute would mean you’re capturing the volatile changes happening in the mini-dimension in the parent dimension as well, thus nullifying the purpose of the mini-dimension.

    Hopefully I’m making sense. I appreciate your content.

    • Jim McHugh

      Foo, a Type 5 outrigger, is a reasonable solution. Again, if I band the data in the mini-dimension, I take away the rapidity of the changes, and therefore this becomes a suitable solution. As I stated earlier, this is a possible solution, not a hard-and-fast rule to apply. I can see situations where this could be a viable solution. You can consider this an option or not; the design to the customer requirements is for you to design.

Leave a Reply

Your email address will not be published. Required fields are marked *