Data Warehousing – The Basics

Introducing Vilocity 2.0….
April 10, 2015
Requirements for a Successful Data Warehouse Project
November 2, 2016

What Are Data Warehouses?

A data warehouse is a large, central store of data from disparate source systems across the enterprise optimized for analysis and reporting. The data is extracted from the source transaction based application databases, transformed into new data structures through the applying a set of rules and functions against the extracted data before loading the transformed data into the data warehouse data structure which is optimized for reporting. This central data store provides the business user with a snapshot of the business at a point-in-time. The data warehouse is used in making both strategic and tactical business decisions.

 

Why Are Data Warehouses Important?

Data warehouses hold large quantities of source system data over long periods of time. This time-variant data contained in the data warehouses provide the foundation upon which historical analytical and reporting applications are built upon. These applications provide the organization’s senior leadership with a quick and holistic view of the success of the organization over time. Data warehouses can also be used and the source data used for researching an anomaly in the data. This researching of anomalies is often called data mining or data discovery. The information delivered from the data warehouse helps the senior leaders make informed decisions based on facts.

 

What Are the Most Common Data Warehouse Methodologies?

There are two main data warehousing methodologies upon which all other data warehousing methodologies are built. These are the methodologies proposed by Bill Inmon and Ralph Kimball, the two most influential leaders in the formation of data warehousing as a business solution. These two gentlemen see the data warehouse from two very different perspectives:

Inmon Methodology

“A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process.” (Inmon, 1995). This is a top-down methodology seeking to provide the “single version of the truth” for each entity the business is measuring in a relational format. Only after all of the full collection of atomic data is collected from the source systems can the data marts for analytics and reporting be created. These data marts are often customized for specific departments or user groups.

Kimball Methodology

“A copy of transaction data specifically structured for query and analysis” (Kimball, 2002). This is a bottom-up methodology seeking to build small dimensional data marts of fact tables (measures) surrounded by dimension tables (entities) for specific departments of user groups. Each data mart provides the data for the analytics and reporting tools. Kimball sees the data warehouse as the collection of all of the data marts in the organization.

Inmon’s methodology sees the data warehouse as a single, colossal repository for enterprise data stored in a relational model, where Kimball’s methodology sees the data warehouse as a collection of subject area data marts, each stored as a dimensional model.

 

Which Methodology Do Most Organizations Use?

Although most organizations try to use the Kimbal Dimensional Methodology, most organizations try to either create their own methodology by taking the best of both methodologies to meet their organization’s specific needs or they use a methodology put forth by a software vendor whose tool meets their current data warehousing needs.

 

Conclusion

Today’s data warehouses play an important role in the success of today’s medium to large businesses because of the timely information they provide to the senior leaders of an organization. Although there are two basic data warehousing methodologies many organizations create their own data warehousing methodology to best suit their organizational needs.


I have created several data warehouses for many organizations in both the public and private sectors. I am hoping to use this blog site as a resource for those entering the field of data warehousing to learn the fundamentals of Data Warehousing as well as providing some tips and tricks for those interested in optimizing their data warehouse. I hope that you find this information informative. I will be posting a new article each Wednesday and I look forward to your feedback and discussion about the weekly topic. Please come back each Wednesday for more information about Data Warehousing and Business Intelligence.

 

Jim McHugh
VP, Information & Technology Services
NuWave Solutions
jmchugh@nuwavesolutions.com

2 Comments

  1. Tony D. says:

    Jim,

    Not a techie…but do you think that concentration of any IT resources in one location increases the risk of bad actors compromising the security? If they break in to the central site, they get everything rather than if data was distributed over different systems or servers.

    I am still nervous about putting everything in the cloud (which to me is a big data, program, and file server warehouse).

    Thanks for your thoughts.

    Tony DeCristofaro

    • Jim McHugh says:

      Tony,

      I understand your apprehension concerning the use of the cloud for the centralized storage of data, but the benefits far outweigh the risks. In fact, I feel that a central, cloud-based, data warehouse is safer and more economical than an in-house managed, federated warehouse. These are the warehouses which hit the front page of the Wall Street Journal because the internal IT staff often lacks the training and discipline to properly secure the sensitive data held in their data structures. Using cloud-based solutions often forces one to adhere to the best practices of data security because the cloud provider wants to ensure that they are not breached and have their reputation stained by a data breach. I hope this quick response helps alleviate your fears.

      Jim