Now that we have established a solid foundation with dimension tables we will now turn our focus to the fact tables. Fact tables are data structures which capture the measurements of a particular business process. The measurements (quantity, amount, etc.) are defined by the collection of related dimensions. This collection of dimensional keys is called the grain of the fact.
Types of Fact Tables
The three basic fact table grains are the transactional, the periodic snapshot and the accumulating snapshot.
Transactional fact tables are the most common fact in dimensional modeling. Transactional fact tables capture the measurement at its most atomic dimensional level, at the point-in-time of the transaction. This allows the fact table to provide robust dimensional grouping and roll-up and drill-down reporting capabilities to the business user.
Periodic Snapshot fact tables capture the state of the metrics at a specified point-in-time. Periodic Snapshot Fact tables are used to quickly assess and review the performance of the measures over specified time intervals (week, month, quarter, year, etc.).
As you can see the Periodic Snapshot fact table and the Transactional fact table can be very similar. In the example above, the sales are rolled up to the month in the Periodic Snapshot fact table. In this table, you are unable to drill down past the month, as this is the lowest grain of this Periodic Snapshot fact table.
Accumulating Fact tables are used to show the activity of progress through a well-defined business process and are most often used to research the time between milestones. These fact tables are updated as the business process unfolds and each milestone is completed. The most common examples can be found in order and insurance processing. Here a single fact table may contain the important business processes milestones upon which the organization wishes to measure performance.
Star Design – One Fact or Multiple Facts
When defining your data model, you need to ask yourself a couple of questions when creating fact tables.
- Do these measures occur simultaneously?
- Are these measures defined at the same grain?
If either answer to these questions is “no” then you need to create separate fact tables to contain these measures.
The classic example of this can be seen in a sales subject area. You are asked to capture the measures quantity ordered and quantity shipped. You will be analyzing them by date, customer and product. Should this be in one fact table or two? Ask yourself the questions above: 1) Do these measures occur simultaneously? NO, because shipping can happen much later and by multiple shippers. 2) Are these measures defined at the same grain? No, because we may have the need to add additional shipping and shipper information to the fact. Therefore, these measures should be broken into two separate fact tables using conformed dimensions of date, customer and product.
Drill Across vs. Joining Facts
Fact tables are like magnets sharing the same poles, there is no joining these two objects. Facts tables have unique dimensional characteristics to address the specific business measurements for which the fact table was created. Because of this unique design, you cannot join two fact tables together as they will be missing some dimensional characteristic, impacting the grain of the measure. If for some reason the two fact tables share the same exact dimensional keys, then these fact tables should be redesigned and consolidated into a single fact table. Many novice dimensional model developers will try to join fact tables together via one or more common dimensional attributes, unfortunately, this can lead to inaccurate results because joining fact tables in this manner can result in a Cartesian product.
The proper way to compare the results from two distinct process measures is to “drill across”. Drilling across, unlike drilling down or rolling up is a multi-step process comprised of summarizing the desired measure(s) from each star using the common grain and then combining the results into a single data structure.
Fact Table Surrogate Keys
There can arise a situation where there are duplicate fact records. Although all dimensional data warehouse designers should avoid this issue it can occur based on the business requirements. Should this issue occur, use a surrogate key as the primary key to capture these records.
Finally, we have transaction fact tables which contain no measures. We call these factless fact tables. These tables are used to capture the action of the business process. These actions can be reported upon or summarized later as periodic facts. For example, a criminal case is a simple fact with no measures but can have a lot of dimensional attributes associated with this fact. Each month I may want reports based on these dimensional attributes. For instance: how many cases of grand theft auto have we had in the past twelve months by month, by precinct and by amount band? What is the average time for a case to close by case type over the past 5 years, by month and by case type? Although the fact contains no measure the data contained is incredibly important to the customer.
Next week we will dive into the periodic fact tables and how to use them appropriately.