Data Warehousing

January 8, 2020

What Is an Operational Data Store?

The definition of an Operational Data Store (ODS) is a subject-oriented, integrated, volatile collection of current data designed to support the tactical decision-making of an organization. An ODS can also be used as a data hub, used to synchronize data between disparate and unconnected applications within the enterprise. Let’s breakdown the definition so we can fully understand the ODS. First, we state that an ODS is subject oriented therefore it should be centered around a single business concept, such as a customer or product. Next, we state that an ODS is integrated. Here we are stating that we can pull […]
December 26, 2019

Data Refresh Techniques

The three keys to warehouse project success are to quickly, efficiently and consistently refresh the data in the warehouse. Here are four ways to design the process with some being generally more successful than others. Refresh the Dimensions, then the Facts This approach attempts to make sure that the process is completed with minimal negative impact should an error arise in the ETL workflow. We accomplish this by inserting/updating (upserting) the Dimension Data, then move on to the Fact Data. Our reasoning is that modifying dimensions has no impact on the reports. This isolates the fact loading workflow as the […]
November 26, 2019

Drilling Down / Drilling Through (Across)

You often hear “drilling down” and “drilling across” from business intelligence professionals, but what does it mean and how do you, as a data modeler, design for this requirement? This blog post will provide you with insights on how to do this. To drill down means to decompose fact data via hierarchical dimensional attributes. Time and geography (location) are good examples of hierarchical dimensions. In a complete time dimension, you will have the day, month, quarter and year hierarchy. In a geography dimension you may have a city, district, state, and region hierarchy. Consider your shown a dashboard with year […]
October 23, 2019

Updating / Replacing Fact Records

You will inevitably come across a time when a source system will allow the updating of transaction records. This drives data modelers, ETL developers, analytics designers, and analytics users crazy. After all the source system stated that this information was a “fact”, leading all to believe it would not change, now you must design for and resolve the changing fact data. In this blog post we will discuss the options and the impact on these decisions will have on your warehouse and BI solution. Let’s start with the easy and move to the complex. The easiest thing to do when […]
September 25, 2019

Late Arriving Dimensions

In my last post I discussed late arriving facts, and although it is slightly annoying to receive delayed transaction data, the impact on the ETL and accuracy of the data in data warehouse is minimal. As we will discuss in this blog post, this is not the case for late arriving dimension (early arriving fact) data. As I have discussed throughout this blog series on designing and loading dimensional data warehouses, we always load data in the following order: lookup tables, then dimensions, and finally facts. But what happens when you get fact data before the dimension data arrives or […]
August 21, 2019

Late Arriving Facts

Late arriving facts are those transactions which are delayed in arriving to the warehouse. For Type I data warehouses there is no real impact but for Type II warehouse models there is a minor challenge that the ETL developer must remember to handle to ensure the late arriving transaction corresponds with dimensional attributes at the time of the transaction. If we have a Type I warehouse or are always assured that we only receive fact data at the same time as the dimensional data then we can use the following simple query where we find the current active record for […]
July 15, 2019

Factless Fact Tables

Factless facts are those fact tables that have no measures associated with the transaction.  Factless facts are a simple collection of dimensional keys which define the transactions or describing condition for the time period of the fact. You may question the need for a factless fact but they are important dimensional data structures which capture important information which can be leveraged into rollup measures or as information presented to a user. The most common example used for factless facts are student attendance in a class. As you can see from the dimensional diagram below the FACT_ATTENDANCE is an amalgamation of […]
June 20, 2019

Period-To-Date Facts

We have all come across times when our customer wants to know how the organization is currently doing. They often want to know how they are measuring up against this time last year or against the projected measure. The most common examples of these request are a year-to-date calculation and a budget vs. actual analysis. In this blog post I will describe how to efficiently address these common business requests. Year-To-Date (YTD) Calculations Our customer has stated that they wish to show a YTD metric for sales which can be broken down into quarterly and monthly metrics as well. In […]
May 28, 2019

Loading Accumulating Snapshot Fact Tables

Often management looks for bottlenecks in corporate processes so that they can be streamlined or used as a measurement of success for the organization. In order to do achieve these goals we need to measure time between two or more related events. The easiest way to report on this time-series process is to use accumulating snapshot facts.  Accumulating snapshot facts are updatable fact records used to measure time between two or more related events. The most common example of this type of fact can be seen in order processing. Let’s take a look! Order processing consists on many serialized processes. […]
April 24, 2019

Loading Transaction Fact Tables

This blog post will focus on loading transaction fact tables, subsequent posts for peoiodioc and accumulating snapshots will follow in the coming weeks. Loading fact tables is very different than loading dimensions. First, we need to know the type of fact we are using. The major types of facts are transaction, periodic snapshot, accumulating snapshot and time-span accumulating snapshots. We also need to know the grain of the fact, the dimensional keys which are associated to the measurement event. Let’s say we want to measure product sales of by customer, product, and date attributes. The source transaction system may provide […]
March 20, 2019

Updating Type II Slowly Changing Dimensions

In this blog post I will provide an example of how I implement Type II dimensions in my data warehouses. The process I go through when refreshing data into a Type II dimension is to first land the data from the source system. I always create a separate schema to hold the data, as it is presented from the source system with no transformations. The tables and columns are named exactly as they are in the source system. The basic idea here is to quickly off-load the data from the source system, thereby eliminating any long running or resource intensive […]
February 20, 2019

Joining Fact Tables

Joining fact tables can be done but there are some inherent risks so you need to be careful when joining fact tables is required. In the following simple scenario, we have a fact of authors to articles and a separate fact of articles to pageviews. Our customer has asked for the ability to 1) find the authors who have provided the most content in a given time period, find the articles which have the greatest number of pageviews for a given time period and 3) find the authors with the highest number of pageviews in a given time period. Our […]
July 28, 2017

Data Warehouse Design Techniques – Constraints and Indexes

In this week’s blog, we will discuss constraints and indexes. In data warehousing, like in life, constraints are things we love to hate. Constraints keep us from making mistakes, which in most cases is a good thing, until we come across an anomaly which needs to be addressed but the constraints prevent this from happening. Most of the time indexes help us to find data faster, but that is not always the case.   What are Indexes? Indexes are data structures which hold field values from the indexed column(s) and pointers to the related record(s). This data structure is then […]
July 5, 2017

Data Warehouse Design Techniques – Aggregates

In this week’s blog, we will discuss how to optimize the performance of your data warehouse by using aggregates. What are Aggregates? Aggregates are the summarization of fact related data for the purpose of improved performance. There are many occasions when the customer wants to be able to quickly answer a question where the data is at a higher grain than we collect. To avoid slow responses due to system summing data up to higher levels we can pre-calculate these requests, storing the summarized data in a separate star. Aggregates can be considered to be conformed fact tables since they […]
June 16, 2017

Data Warehouse Design Techniques – Derived Schemas

Getting the correct answer is one of the most important requirements of a data warehouse, but this is only part of the requirement. The full requirement is to provide the correct information, to the user at the right time. The information is no good to the user if they need the answer today but you need to write a custom report which will take a week to develop, test and deploy. Although you will be able to provide the correct answer, it will be late and therefore of limited to no value to the decision-making process. How can we address […]
May 24, 2017

Loading Hierarchical Bridge Tables

This blog article is a follow-up to the Ragged Hierarchical Dimensions article I posted a few weeks ago.  In this article, I spoke of using a hierarchical bridge table at a high level, today I will discuss the nuances of this method in more detail.Let’s begin at the dimension table. In order to capture the data with the appropriate level of detail you will need to design your dimension with recursive fields.I have designed this dimension table as a type 2 dimension table and I include the recursive (parent) field as an attribute. This allows me to capture the point-in-time […]
April 21, 2017

Data Warehouse Design Techniques – Accumulating Snapshot Fact Tables

Today we will look at a different type of fact table, the accumulating fact. Accumulating Fact Tables Accumulating Fact tables are used to show the activity of progress through a well-defined business process and are most often used to research the time between milestones.  These fact tables are updated as the business process unfolds and each milestone is completed.  The most common examples can be found in order and insurance processing. Here a single fact table may contain the important business processes milestones upon which the organization wishes to measure performance. A different version of this same accumulating fact table […]
April 5, 2017

Data Warehouse Design Techniques – Snapshot Fact Tables

Last week we defined the three types of fact tables and have established a solid foundation in the definition and use of transactional fact tables. This week we will focus on periodic snapshot fact tables. Snapshot Fact Tables (Periodic) Snapshot fact tables capture the state of the measures based on the occurrence of a status event or at a specified point-in-time or over specified time intervals (week, month, quarter, year, etc.). The Snapshot Fact table and the Transactional fact table can be very similar in design. In the design above the sales are rolled up to the month in the […]
March 29, 2017

Data Warehouse Design Techniques – Fact Tables

Now that we have established a solid foundation with dimension tables we will now turn our focus to the fact tables. Fact tables are data structures which capture the measurements of a particular business process. The measurements (quantity, amount, etc.) are defined by the collection of related dimensions. This collection of dimensional keys is called the grain of the fact. Types of Fact Tables The three basic fact table grains are the transactional, the periodic snapshot and the accumulating snapshot. Transactional fact tables are the most common fact in dimensional modeling. Transactional fact tables capture the measurement at its most […]
March 15, 2017

Data Warehouse Design Techniques – Bridge Tables

There will be times when you arrive at a many to many relationship between a fact and a dimension. This many to many relationship is in violation of the dimensional modeling rules, so how do you handle this multi-valued relationship?  Bridge tables. In this blog post I will take you through the importance and usefulness of bridge tables and how and when to use these data structures. Multi-Valued Dimensions Multi-valued dimensions are more common than most people think and are often “refactored” to a single valued dimension. Coming from a publishing background, sales of articles come to mind as a […]