Designing a Dimensional Data Warehouse – The Basics

Designing a Relational Data Warehouse
November 17, 2016
Data Warehousing Best Practices
December 14, 2016
Last week I wrote about Relational (Atomic) Data Warehouses and how to create these data structures. This week we will look at Dimensional Data Warehouses and how they differ from the Relational Data Warehouse.

Creating a dimensional data warehouse is very different from creating a relational data warehouse. In the relational data warehouse, you are looking to store every atomic element which the organization has deemed to be important from the source applications and storing this data in a relational structure. Relational data structures are optimized for transactional speed and the limiting the use of disk space which is why this methodology is so prevalent in transaction-based systems. Dimensional models are specifically designed for optimized for the delivery of reports and analytics.

 

Dimensions – What are they?

Dimensional Data Warehouses must have dimensions, right? So what are dimensions? Think of dimensions as containers of adjectives which describe the dimension and will help us limit the fact information we wish to see. Dimension tables can contain a lot of columns if the dimension is robust enough.

All dimensional data warehouses have a date dimension. This dimension is a key in all fact tables and provides context to the fact. The date dimension can be defined at the day level, the hour level, the week level, etc. Most warehouses define the date dimension at the date level and define it is such a way that it can provide roll-up information for weekly, monthly, quarterly, yearly data. If a finer “grain” of time is needed you can create a time dimension as well to provide hour and, if necessary, half-hour, quarter-hour or minute information to the user. Here is an example of a simple date and time dimensions.

dimensions_1

From these two dimension tables you can see everything you want to know about a date or time. Joining the DIM_DATE table with a fact table can be used to help you quickly and efficiently find all of the fact data which occur on Tuesdays in Fiscal Year 2016.

 

Facts – Why do I want them?

Fact tables are the foundation of a dimensional data warehouse. Fact tables contain the measurement of the organization’s transactions upon which the business’ key performance indicators (KPIs) are graded. These tables consist of relatively few number of columns but a large number of records.

There are 2 basic components in a fact table the keys and the measures. The measure is what we are counting, adding, averaging, etc. The measure is limited by information stored in the dimension, therefore the fact must connect to the dimension primary key via a foreign key. The conglomeration of foreign keys will create a unique primary key for that record.

The overarching principal of a fact table is the grain. The grain is the business description of the measured event. An example of defining the grain is “the number of each product sold by product name, by product type, by day, by month, by quarter and by year.” As you can see there are many different grains here (each individual product, product type, day, month, quarter and year). Therefore, the dimensional data warehouse must be able to provide the answers to these requests, quickly and efficiently. In this example the measure is the count of products sold and the lowest “grain” is product and day.

In our example we are measuring the KPI for product sales, therefore, in the simplest format, the FACT_PRODUCT_SALES table may look like the example below.

fact_1

Based on this simple fact table I can easily join this table with the DIM_DATE and the DIM_PRODUCT tables the quickly calculate the number of products sold by a particular date, based on the information stored in the DIM_DATE dimension or on the product name and type based on the information stored in the DIM_PRODUCT table.

 

Star Schema – Who needs them?

This joining of facts with dimensions is called a Star Schema. Here is an example of a star schema.

star_1

From this schema we can quickly calculate the cash sales by store by product for a given month or the number of products sold to those in the 20101 postal code during the month of September, etc. This ability to answer so many questions while storing so little data provides the true power in dimensional modeling.

2 Comments

  1. Best of the best explanation i have read till now.

  2. Naveen Guru says:

    Excellent explanation .Thanks Jim 🙂 🙂