Data Warehousing

March 8, 2017

Data Warehouse Design Techniques – Ragged Hierarchical Dimensions

In my last post, we discussed the creation of simple hierarchical dimensions. In this post, I will discuss how to handle those ragged hierarchies, those which can skip levels in the hierarchy. Snowflaking There may be instances where you wish to normalize the hierarchy into a snowflake. Below is a city, state/province, country hierarchy which can easily be converted into a snowflake model. Here we would simply take the list of all of the countries and place them into a country dimension, the state/provinces into a state/province table and finally a city into a city table. While loading these dimensions, […]
February 15, 2017

Data Warehouse Design Techniques – Simple Hierarchical Dimensions

Today, continuing through our look at the advanced modeling techniques of dimensional data warehouses, we will take a look at simple hierarchical dimensions. Simple Hierarchical Dimensions Hierarchical dimensions are those dimensions which have a parent/child relationship. In simple hierarchies there every child has a parent at the level above with no skipping of levels. A date (day, month, quarter, year) is the most common example of a simple hierarchical dimension and one that is used by all dimensional data warehouses. The question for the data modeler is how to model the hierarchy? Let’s use the date dimension for some initial […]
February 8, 2017

Data Warehouse Design Techniques – Snowflaking

We will continue our deep dive through advanced dimensional data warehouse design techniques by discussing snowflaking. Snowflaking Dimension Tables According to Dr. Kimball, dimensions should be designed as long, denormalized records. The reason for denormalization is to ensure maximum performance of the DW/BI queries. Snowflaking is the normalization of the dimension tables by removing all redundant attributes into separate dimension tables. The reason for normalizing is often the desire to “enhance” the design and reduce the storage costs of the warehouse. Although this normalized design appears to optimize the storage and maintenance of the warehouse it actually introduces maintenance complexities […]
February 1, 2017

Data Warehouse Design Techniques – Degenerate & Junk Dimensions

Today we will look at the advanced dimensional data warehouse design techniques of Degenerate and Junk Dimensions. Degenerate Dimension Tables As I stated in an earlier blog post, degenerate dimensions are not physically implemented data structures. Degenerate dimension attributes exist in the fact table as a part of the primary key but have no corresponding dimension. Let’s look at the most common example of a degenerate dimension, the invoice. Invoices contain a lot of information which can be complex and sometimes difficult to model. All invoices contain information about products and customers, but some contain additional information like shipping information […]
January 25, 2017

Data Warehouse Design Techniques – Rapidly Changing Dimensions

In my last blog post, I showed how to capture changes for slowly changing dimensions. Today we will see how to handle rapidly changing dimensions. Rapidly Changing Dimensions Rapidly changing dimensions are dimensions where the attribute values of the dimension change frequently causing the dimension grow rapidly if you a have designed the dimension to capture the changes as a Type 2 dimension. The rapid growth of this dimension will impact maintenance and performance as the dimension grows. Let’s take a look at solutions to this challenge. Mini-Dimension (SCD Type 4) Mini-dimensions contain the rapidly changing attributes of the original […]
January 18, 2017

Data Warehouse Design Techniques – Slowly Changing Dimensions

In my last blog post, I demonstrated the importance of conformed dimensions to the flexibility and scalability of the warehouse. This week we will discuss the importance of capturing the dimensional change in slowly changing dimensions.Slowly Changing Dimensions Slowly Changing Dimensions (SCD) are the most commonly used advanced dimensional technique used in dimensional data warehouses. Slowly changing dimensions are used when you wish to capture the changing data within the dimension over time. There are three methodologies for slowly changing dimensions.   Type 1 – For this type of slowly changing dimension you simply overwrite the existing data values with new […]
January 11, 2017

Data Warehouse Design Techniques – Conformed Dimensions

In my last blog post, I presented to you some of the more widely used advanced technical dimensional design techniques. Over the next several of weeks, I will provide detail to each of the design techniques. Conformed Dimensions Conformed dimensions are those dimensions which have been designed in such a way that the dimension can be used across many fact tables in different subject areas of the warehouse. It is imperative that the designer plan for these dimensions as they will provide reporting consistency across subject areas and reduce the development costs of those subject areas via reuse of existing […]
January 4, 2017

Advanced Dimensional Data Warehouse Design Techniques

In an earlier blog post, I walked you through the basics of dimensional data warehouse design by introducing you to dimension tables, fact tables and star schema design. For this blog post, I will introduce some of the more widely used advanced dimensional design techniques which I will describe in detail in the coming weeks. DIMENSIONS Conformed Dimensions Conformed dimensions are those dimensions which have been designed in such a way that the dimension can be used across many fact tables in different subject areas of the warehouse.   Slowly Changing Dimensions Slowly changing dimensions are the second most commonly […]
December 21, 2016

Data Warehousing Documentation Review

One of the most important artifacts delivered as a part of any project is documentation. Documentation is the bane of IT development teams. Most developers feel “if it was hard to write, it should be even harder to understand and modify.” When developing and delivering a data warehouse documentation is critical to the success of the project. This documentation will help both the business users and the technical teams understand the source, the transformation and storage of the data they need to consume. These documents are the foundation upon which the warehouse will be built.   Business Requirements Document (BRD) […]
December 14, 2016

Data Warehousing Best Practices

There are many times when you completed a task only to say “I wish I would have known that before I started this project” Whether it is fixing the breaks on your car, completing a woodworking project or building a data warehouse, best practices should always be observed to ensure the success of the project. These are seven of the best practices I have observed and implemented over the years when delivering a data warehouse/business intelligence solution. As you will see, most of these are not technical solutions but focus more on the soft skills needed to ensure the success […]
November 30, 2016

Designing a Dimensional Data Warehouse – The Basics

Last week I wrote about Relational (Atomic) Data Warehouses and how to create these data structures. This week we will look at Dimensional Data Warehouses and how they differ from the Relational Data Warehouse.Creating a dimensional data warehouse is very different from creating a relational data warehouse. In the relational data warehouse, you are looking to store every atomic element which the organization has deemed to be important from the source applications and storing this data in a relational structure. Relational data structures are optimized for transactional speed and the limiting the use of disk space which is why this […]
November 17, 2016

Designing a Relational Data Warehouse

Most applications today are built using relational databases.  This relational modeling is based on mathematical set theory.  It was first described in 1969 by E.F. Codd who stated that a grouping of elements (tuples) could be related to other tuples in a way that would limit the amount of data stored for any specific tuple and therefore optimize the storage of such data. It’s actually easy to do.  Let’s look at an example using information collected about John Smith. John Quincy Smith DOB: 12/31/1969 National ID: 123-45-6789 1234 Main Street, Anytown, VA 20001 (h) 8000 Corporate Drive, Suite 1022, Sombertown, […]
November 9, 2016

Data Warehouse Design Methodologies

There are two traditional data warehouse design methodologies came of age in the 1990’s, that of Bill Inmon’s Top-Down Atomic Data Warehouse and that of Ralph Kimball’s Bottom-Up Dimensional Data Warehouse. These methodologies have been used over the past 20 years to create informational data stores for organizations seeking to leverage their data for corporate gain. In this article, we will compare and contrast these two methodologies.   Atomic Data Warehouse – Bill Inmon Bill Inmon’s Atomic Data Warehouse approach is strategic in nature and seeks to capture all of the enterprise data in 3rd Normal Form and store all […]
November 2, 2016

Requirements for a Successful Data Warehouse Project

In order to have a successful data warehousing project you will need the following five challenges addressed: #1. Executive Sponsorship All projects need executive sponsorship. Without senior executive leadership, most corporate projects will likely fail. Data warehouse projects are no different, in fact, these projects are even more in need of executive sponsorship than most business initiatives. It is imperative that the organization’s corporate leaders infuse the enterprise with excitement concerning the success of a data warehousing project. These executives must also tie the success of the project to each and every one of their direct reports. This will ensure […]
October 26, 2016

Data Warehousing – The Basics

What Are Data Warehouses? A data warehouse is a large, central store of data from disparate source systems across the enterprise optimized for analysis and reporting. The data is extracted from the source transaction based application databases, transformed into new data structures through the applying a set of rules and functions against the extracted data before loading the transformed data into the data warehouse data structure which is optimized for reporting. This central data store provides the business user with a snapshot of the business at a point-in-time. The data warehouse is used in making both strategic and tactical business […]
October 25, 2012

Jim McHugh of NuWave Solutions Presents at GEOINT

Jim McHugh of NuWave Solutions presents, “The Geospatial Data Challenge – Too Much Data / Too Little Information” and discusses innovative data warehousing and business analytics solutions at the 2012 GEOINT Symposium. See full press release on PRWeb here!  
Contact