There will be times when you arrive at a many to many relationship between a fact and a dimension. This many to many relationship is in violation of the dimensional modeling rules, so how do you handle this multi-valued relationship? Bridge tables. In this blog post I will take you through the importance and usefulness of bridge tables and how and when to use these data structures.
Multi-valued dimensions are more common than most people think and are often “refactored” to a single valued dimension. Coming from a publishing background, sales of articles come to mind as a multi-valued dimension. The sale of articles must have one author but could very easily have two or more authors. How do you handle this in the warehouse? You could define one author as the primary author or you could create a really long field and place all of the authors of the article in this field. But how can you find the total sales for all of the articles written by a single individual who has collaborated on many articles as well as those articles written on his/her own? According to this requirement, each author needs to be uniquely identified and properly associated with the articles they have authored. Below is a simple sales fact table.
Unfortunately, this design will over count the sales for those articles which have multiple authors. To address this issue, we will need to use bridge tables to connect the gap between the fact and dimension tables.
A better way to model this star can be seen below.
In this model you see two additional tables, DIM_AUTHOR_GROUP and BR_AUTHOR_GROUP. These are the “bridge” which allow authors to be grouped together while maintaining each author as an individual entity. I also added a sales weighting factor in the BR_AUTHOR_GROUP table, so that each author can receive the appropriate proportion of the sale. This additional field will provide the users with a more accurate sales amount for each author and should eliminate the possibility of overcounting a sale when used correctly.
Even more common than multi-valued dimensions are multi-valued attributes. Multi-valued attributes can contain more than one value for a single dimensional attribute. Individuals have multiple phone numbers and email accounts, patients can have multiple medical diagnoses, multiple people can own a bank account or loan.
The most common way to address this challenge is to flatten the repeating attribute into separate columns. We often see this in addressing the email/phone number challenge by creating separate home, work, mobile phone numbers and home and work email addresses.
Another way to do address multi-valued attributes is to create a relationship between the dimension table and a secondary dimensional table (outrigger table). Let’s walk through an example by using employee skills. Since an employee can have more than one skill we will need a dimensional attribute bridge to associate these skills with the employee. This is done by simply associating the parent dimension (DIM_EMPLOYEE) with the repeating attribute values contained in the outrigger table (DIM_EMPLOYEE_SKILLS).
As you can see the bridge table concept is simple, yet powerful way to address a many to many fact/dimension relationship but using them does come with a cost. The loading of these tables becomes more complex, there is an increased possibility of double counting, changing data can cause errors in reporting and some reporting tools cannot use these data structures.
Next week we will move on to fact tables.