In this week’s blog, we will discuss how to optimize the performance of your data warehouse by using aggregates.
What are Aggregates?
Aggregates are the summarization of fact related data for the purpose of improved performance. There are many occasions when the customer wants to be able to quickly answer a question where the data is at a higher grain than we collect. To avoid slow responses due to system summing data up to higher levels we can pre-calculate these requests, storing the summarized data in a separate star. Aggregates can be considered to be conformed fact tables since they must provide the same query results as the detailed fact table.
When should I use Aggregates?
Aggregates are to be considered for use when the number of detailed records to be processed is large and/or the processing of the customer queries begins to impact the performance of the BI application. Customers of BI applications expect an application which is very responsive to their queries. Aggregates are the way to meet this implied requirement.
The most common example of an aggregate is product sales. In the initial star below we can see that the fact contains the following dimensional details: Product, Customer, Store and Day. As you can imagine for a large store this fact table could contain hundreds of millions of records per day.
This is great when our customer wants to see this level of detail. Unfortunately, the customer’s default request is same store sales month vs. month sales for the past 24 months including month-to-date. Occasionally the customer wants to drill down and see the top products sold during the month. Aggregating this data by month and eliminating the customer dimension we can drastically reduce the number of records in the fact table thereby improving the performance of the most common request by the customer.
To load this new fact table, you use the detailed fact as the source and make the appropriate changes to the ETL to create this new fact. Notice that the dimension tables, DIM_STORE and DIM_PRODUCT, did not change and we simply swapped DIM_MONTH for DIM_DATE to complete the summarization.
Other superior hierarchy attributes could be used to further reduce the number of fact records. For example, stores could also be rolled up by district and region and products could be summarized by category and sub-category.
Now that this summarized data is available the BI tools can take advantage of this performance increase due to the reduced number of records in the fact and provide accurate answers to the users in a very responsive way. When detailed records are needed the base fact table is available to answer those queries. The performance should still be good here because you should be limiting the number of records you are requesting and your indexing strategy should be supporting the use of the detailed fact star for these requests.