Designing a Relational Data Warehouse

Data Warehouse Design Methodologies
November 9, 2016
Designing a Dimensional Data Warehouse – The Basics
November 30, 2016
The use of relational modeling is prevalent in today’s business landscape as most applications are built using relational databases. Relational modeling is based on mathematical set theory and was first described in 1969 by E.F. Codd. He stated that a grouping of elements (tuples) could be related to other tuples in such a way so as to limit the amount of data stored for a specific tuple and therefore optimize the storage of such data. This sounds difficult! However, it is easy. Let’s look at a real world example.

Take a look at the information collected about John Smith below.

John Quincy Smith
DOB: 12/31/1969
National ID: 123-45-6789
1234 Main Street, Anytown, VA 20001 (h)
8000 Corporate Drive, Suite 1022, Sombertown, VA 20101 USA (w)
jsmith@company.com (w)
john.smith@gmail.com (p)
jqsmith@yahoo.com (p)
(703) 555-1212 (w)
(703) 212-1555 (h)
(571) 234-9876 (c)
1234-5678-9012-3456 (Visa)

Could all of this information be stored in a single table? Yes, absolutely. However, storing data in a single table is inefficient. In this example, we added a second personal email address column. For every record, regardless of whether they have a work address or second personal email address storage, although minimal, will be allocated. Another challenge is the need to add additional columns. What if you want to add the ability to choose from more than stored credit card to pay for your purchase or you want to add and save multiple shipping addresses. In each instance, you will need to add columns to this table and modify a lot of code to ensure you are choosing the correct credit card and shipping address.

blog004-transaction

The more efficient way is to create a table for each of these logical groupings and continue to break these groupings down to their most basic elements. This will provide you with the greatest amount of flexibility using the least amount of storage space.

First, begin by creating logical groupings of data.

Personal Address Email Phone Number Credit Card
Name Address Line 1 Email Address Phone Number Credit Card Number
DOB Address Line 2 Email Type Phone Type Credit Card Type
SSN City
State/Province
Postal Code
Country

This is a good start but there are still some repeating values which we will look up, and you could possibly add, delete or change over time.

Address Type Email Type Phone Number Type
Address Type (Home & Work) Email Type (Work & Personal) Phone Type (Work, Home & Cell)
State Province Country Credit Card Type
State Province Name Country Name Credit Card Type (Visa & MasterCard, etc.)

blog004-2nf

Now connect the tables by joining the appropriate IDs.

blog004-3nf

This is a basic relational data model. Each transactional system in your organization will most likely have data models which may look similar to this one.

How does this transaction based relational data model differ from a relational data warehouse data model? Essentially it does not. The same principals are used in both instances although the amount of data stored is usually much larger in the data warehouse. One significant difference may be the addition of fields to the tables in the warehouse to identify the source of the data. Using relational modeling techniques to create a data warehouse for some or all of your source application data will create, a subject-oriented, non-volatile, time variant and integrated data store for your corporate data. This relational data structure is the foundation upon which you can build reporting, and analytic data structures called Data Marts, which we will discuss in a future blog post.

1 Comment