In this blog post, a part of my deep dive series, I will focus on Degenerate Dimensions in this post.
When introducing the concept of dimensional modeling, I like to compare it to the English language. I start with dimensions, which are essentially nouns (person, places, things, concepts, or ideas). The dimensions are containers of adjectives, describing the dimension (noun). Facts are actions (verbs) that connect two or more dimensions.
Degenerate Dimensions, as the name implies, are those dimensional attributes that lack the distinctness of structure found in regular dimensions. Degenerate dimensions are those attributes that do not fit into any single dimension table and, therefore, only exist as a key value in a fact table used for filtering queries. This is somewhat contrary to dimensional modeling, and therefore we need to understand the reason for the degenerate dimension and when we should use this concept.
Let’s review the definition I have provided above. First, degenerate dimensions are dimensional attributes. That means these attributes are descriptive of a dimension and can be used to filter the fact table. The next part of the definition, that lacking the distinctness of structure, is where we see the separation from a typical dimensional attribute. A revised way to look at this is everything about the degenerate dimension is contained in the attribute; therefore, it would be a dimension table with a single key attribute. Finally, the definition includes by stating it is used in fact tables for filtering, grouping, and sorting. As we can see, this attribute is needed in the fact table because it provides all of the abilities a normal dimensional attribute allows.
Believe it or not, Degenerate Dimensions are relatively common and occur in the most basic of dimensional models. Let’s look at the conceptual data model for an order/invoice. For a purchase, we would have a Customer (dimension) who placed an Order (fact) via an Invoice (dimension) for a Product (dimension) (see figure 1.)
From a conceptual point of view, we are fine. It all works as planned, no issues.
Let’s move to a logical model (see figure 2.)
OK, I can easily define the customer and product dimensions, but I have a challenge finding attributes for the invoice dimension. All I could come up with is the Invoice Number and an Invoice Line Number. Well, that would make the dimension as deep as the fact table, so what value is a dimension with attributes that cannot live outside the fact? Answer: none. Therefore, I need to consolidate the Invoice Number and an Invoice Line Number into the Order fact as degenerate dimensions (figure 3.)
Now that I have moved the Invoice Number and an Invoice Line Number into the fact, I see that I have reached an optimal solution to my modeling challenge. I can still easily find all of the sales by customer and product. I can also easily filter, group, and sort what was purchased together in a single transaction via the invoice number.
Let’s look at another example. A long time ago, I worked for a large insurance company. As we all know, insurance companies sell policies on which claims are made. When we modeled out the requirements conceptually, we had something like this for an insurance claim (figure 4.)
But when we moved to a logical perspective, we modeled it this way (figure 5.)
As you can see, the claim dimension is a degenerate dimension because everything we want to associate with the claim is found in the fact, leaving the claim dimension with only a claim number. We decided to model it this way because we didn’t want to snowflake claim type from the claim dimension. That left us with only a claim number in each of these dimensions; therefore, we made this attribute a degenerate dimension. Our decision to use a degenerate dimension provided for faster query response.
In conclusion, Degenerate Dimensions are references to the operational world and are useful in filtering, grouping, and sorting facts. You always need to be careful when implementing degenerate dimensions and follow the process I describe above. Remember, degenerate dimensions are not a panacea, and the misuse of degenerate dimensions can lead to unnecessary use of space and poor performing queries.