Data Warehouse Design Techniques – Degenerate & Junk Dimensions

Data Warehouse Design Techniques – Rapidly Changing Dimensions
January 25, 2017
Data Warehouse Design Techniques – Snowflaking
February 8, 2017
Today we will look at the advanced dimensional data warehouse design techniques of Degenerate and Junk Dimensions.

Degenerate Dimension Tables

As I stated in an earlier blog post, degenerate dimensions are not physically implemented data structures. Degenerate dimension attributes exist in the fact table as a part of the primary key but have no corresponding dimension. Let’s look at the most common example of a degenerate dimension, the invoice.

Invoices contain a lot of information which can be complex and sometimes difficult to model. All invoices contain information about products and customers, but some contain additional information like shipping information and product location (warehouse) information. Starting with a conceptual the model for an invoice, our model may look like this:

Some may want to try to create an invoice dimension to capture the invoice information in a single dimension as can be seen in this logical model:

While this model accurately depicts the invoice transaction there are several issues with this model. The greatest among these issues would be that the invoice dimension can be almost as large as the invoice line fact table based on the average number of products purchased. You shouldn’t design a dimension to grow in proportion with the fact table because it makes the analysis of the fact more difficult because of the size of the dimension.

 

Another modeler may want to depict the invoice as a fact and join that fact to the invoice line fact table in a hierarchical relationship.

This too causes issues with the size of the two tables you are joining together as well as the challenge with joining two fact tables together. (In general, you should not join fact tables together because of differing grains)

 

The best way to model an invoice is to use a degenerate dimension for the invoice number.

By using the actual invoice number and invoice line number as a key attribute in the invoice line fact table you can eliminate the challenges the other models have concerning the ability to quickly and accurately analyze the fact using the dimensional attributes.

Junk Dimension Tables

Junk dimensions are used to reduce the number of dimensions in the dimensional model and reduce the number of columns in the fact table.  A junk dimension combines two or more related low cardinality flags into a single dimension. An example of this may be car color (red, black, blue, etc.) and body style (sedan, van, SUV, etc.) As you can see these are limited in number and, if created as single dimensions, the dimensions would be limited to a single attribute. In order to eliminate these small dimensions, we create a single “junk” dimension which cross joins all possible attributes into a single dimension which will be used in the fact table.

 

By combining these into a single dimension we have made the model easier to understand and use by both IT and business users.

These two types of dimensions are useful and powerful in creating better to use and understand data models.

1 Comment

  1. Junk Dimension Tables example you have mentioned, does not shows repetition?