Data Warehousing Documentation Review

Data Warehousing Best Practices
December 14, 2016
Advanced Dimensional Data Warehouse Design Techniques
January 4, 2017
One of the most important artifacts delivered as a part of any project is documentation. Documentation is the bane of IT development teams. Most developers feel “if it was hard to write, it should be even harder to understand and modify.” When developing and delivering a data warehouse documentation is critical to the success of the project. This documentation will help both the business users and the technical teams understand the source, the transformation and storage of the data they need to consume. These documents are the foundation upon which the warehouse will be built.

 

Business Requirements Document (BRD)

This is a formal document which requires review and approval. The BRD captures problem the organization is trying to solve and any restrictions on that need to be considered. Essentially, the BRD defines the business user’s high-level specifications (scope) on what they are hoping to achieve from the data warehouse and delivered through a BI reporting or analytic tool along with any assumptions, constraints and risks which impact or may impact the project. This document should also include the as-is and to-be business process diagrams. Finally, this document will contain the actual business requirements statements (“the warehouse must capture …”, “the report must contain …”) and is the foundation upon which the technical design will be built. The BRD should contain the requirements of the system as defined by the business users. All technical jargon should be eliminated from this document as it should be written in the vernacular of the business user.

 

Fact/Qualifier Matrix

A Fact/Qualifier Matrix is a simple spreadsheet with the fact measures across the top and a list of the Dimension attributes down the left side. The intersection of these is the requirement. For example:

  1. The application must be able to deliver the total number of widgets sold by day, month, quarter and year.
  2. The application must be able to deliver the total number of widgets sold by widget type and by day, month, quarter and year.
  3. The application must be able to deliver the total number of widgets sold by widget color and by day, month, quarter and year.
  4. The application must be able to deliver the total number of widgets sold by supplier and by day, month, quarter and year.

 

This simple spreadsheet is an excellent tool used to fully ferret out the detailed reporting requirements from the user. In this example, you may want to go back and ask the business user if they ever may need to know how many widgets they sold by type and by supplier.

 

Conceptual Data Model

This is a high-level model of the warehouse. This model simply focuses on the list of entities and the relationships to one another. No attributes are included in the conceptual model and the number of entities depicted is usually limited to 20 to 50 entities, although some, more complex models, can have up to 100 entities.

Because of the high-level rendering this model, this is an excellent tool to confirm the list of entities and their relationships with the business users.

 

Physical Data Model

This is a fully attributed data model which takes into account the persistent storage technology used. This model will be converted into the physical database data structures of the warehouse. These data structures include all entities and their corresponding attributes and datatypes. This model also documents the physical relationships between entities via primary keys and foreign keys, the necessary indexes, views, materialized views to provide the ETL and BI development teams with a view of the data structures they are using to load or retrieve data. This detailed model is not usually shared with the business users as there is too much technical detail in this model for them to easily consume.

 

Data Dictionary & Source to Target Mapping

This is critical to the overall understanding of the physical data model. In some cases, the data dictionary can be stored in predefined fields or in comments fields for each attribute in the physical data model. Other times, the data dictionary can be a separate document or spreadsheet. The critical element of the data warehouse data dictionary is the definition of the attribute. I like to combine this attribute definition with the source to target mapping document. My data warehouse data dictionaries include the source database, table and column from which the data is pulled, any transformational process which the data undergoes before being loaded, the destination database, table and column name and finally the business definition of the attribute.

This combination provides the business users with a more detailed dictionary entry, allowing them to trace back the definition to the source system and confirm any transformation which happened to that attribute. These definitions of the attributes are derived from the business users and the business users should be involved in the review and approval of the data dictionary.

 

1 Comment

  1. Amazing, i never thought about documenting attributes/fields like this. Now i will…