Data Warehouse Design Techniques – Conformed Dimensions

Advanced Dimensional Data Warehouse Design Techniques
January 4, 2017
Data Warehouse Design Techniques – Slowly Changing Dimensions
January 18, 2017
In my last blog post, I presented to you some of the more widely used advanced technical dimensional design techniques. Over the next several of weeks, I will provide detail to each of the design techniques.

Conformed Dimensions

Conformed dimensions are those dimensions which have been designed in such a way that the dimension can be used across many fact tables in different subject areas of the warehouse. It is imperative that the designer plan for these dimensions as they will provide reporting consistency across subject areas and reduce the development costs of those subject areas via reuse of existing dimensions. The date dimension is an excellent example of a conformed dimension. Most warehouses only have a single date dimension used throughout the warehouse.

Conformed Dimension – Challenges

  • Structural Differences – in some cases you will have source tables which contain different table structures These may include:
    • additional or missing columns
    • columns with different data types
    • columns named differently containing the same or similar data

These differences can usually be resolved easily but occasionally this can be difficult as the differences must be resolved and agreed to by the business owners of the source systems. Without resolving these issues workarounds will be implemented which, over time, will cause maintenance and reporting issues.

Note the challenges combining the two product tables below. In the PRODUCT_RETURN table, the primary key is PRODUCT_SKU, the PRODUCT_NAME field is defined as varchar(100), PRODUCT_DESCRIPTION is defined as varchar(100) and PRODUCT_PRICE is defined as number(5,2).

In the PRODUCT_SALES table, the PRODUCT_UPC is defined as the primary key. Luckily this field matches the PRODUCT_UPC field in the PRODUCT_RETURN table. The PROD_NAME field is defined as varchar(50), PROD_DESCRIPTION is defined as varchar(150) and PROD_PRICE is defined as number(7,2), all of which are different than that of the PRODUCT_RETURN table. Data profiling and data verification will be necessary to ensure the data between the two tables matches. The PRODUCT_SALES table does not have a column for PRODUCT_CATEGORY. One final difference is the PROD_TYPE, PROD_ACTIVE_FL, PROD_START_DT and PROD_END_DT. These fields are used to capture the change history of a product that is not captured by the PRODUCT_RETURN table.

To resolve these issues and conform the tables I would propose the following conformed product dimension.

  • Content Difference – sometimes you will find that the content stored in the tables are different:
    • Text fields contain upper case vs. mixed case characters
    • Columns contain different/missing data

The data in the PRODUCT_RETURN table appears to match that of the PRODUCT_SALES table with some notable exceptions. The product names do not always match between the two tables and the UPCs repeat in the PRODUCT_SALES table. The PRODUCT_SALES table does not have SKUs therefore, we need to ensure the SKUs in the PRODUCT_RETURN table is reliable historically and can be used as an attribute in the new dimension. Before moving forward these issues must be brought before the source system product owners for a mutually agreeable resolution to these data anomalies.

PRODUCT_RETURN

 

PRODUCT_SALES

Conformed Dimension – Types

  • Shared Dimension – when a fact table shares the same logical dimension as another fact table in another star or data mart. The date dimension is an example of a shared dimension.
  • Conformed Rollup – is a subset of data from a dimension table and the subset of data share the same structure and content. An example of a rollup dimension can be seen in the month dimension shown below. As you can see the month dimension can be linked to the date dimension and used to “roll up” data from a single day to a month.

  • Overlapping Dimension – are those source tables and dimensions which conform via some overlapping attributes. In most instances, the overlapping fields can be moved into a new dimension to avoid overlap, create an easier to maintain model and provide the ability to track changes in the relationships over time via a fact table.

Conformed Dimension – Benefits

Conformed dimensions are the foundation upon which flexible and scalable dimensional models are built. A data modeler must plan for conformance in their design by ensuring that key dimensions of the enterprise are conformed dimensions that can be expanded to include new attributes over time. Dimensional models that lack conformed dimensions will only be able to answer simple questions for the particular data mart for which the dimension was built. When designed properly, conformed dimensions can and will be used by several data marts, providing the customer with insight into their data that exceeds their initial expectations.

1 Comment