A data warehouse is a large, central store of data from disparate source systems across the enterprise optimized for analysis and reporting. The data is extracted from the source transaction based application databases, transformed into new data structures through the applying a set of rules and functions against the extracted data before loading the transformed data into the data warehouse data structure which is optimized for reporting. This central data store provides the business user with a snapshot of the business at a point-in-time. The data warehouse is used in making both strategic and tactical business decisions.
Data warehouses hold large quantities of source system data over long periods of time. This time-variant data contained in the data warehouses provide the foundation upon which historical analytical and reporting applications are built upon. These applications provide the organization’s senior leadership with a quick and holistic view of the success of the organization over time. Data warehouses can also be used and the source data used for researching an anomaly in the data. This researching of anomalies is often called data mining or data discovery. The information delivered from the data warehouse helps the senior leaders make informed decisions based on facts.
There are two main data warehousing methodologies upon which all other data warehousing methodologies are built. These are the methodologies proposed by Bill Inmon and Ralph Kimball, the two most influential leaders in the formation of data warehousing as a business solution. These two gentlemen see the data warehouse from two very different perspectives:
“A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision-making process.” (Inmon, 1995). This is a top-down methodology seeking to provide the “single version of the truth” for each entity the business is measuring in a relational format. Only after all of the full collection of atomic data is collected from the source systems can the data marts for analytics and reporting be created. These data marts are often customized for specific departments or user groups.
“A copy of transaction data specifically structured for query and analysis” (Kimball, 2002). This is a bottom-up methodology seeking to build small dimensional data marts of fact tables (measures) surrounded by dimension tables (entities) for specific departments of user groups. Each data mart provides the data for the analytics and reporting tools. Kimball sees the data warehouse as the collection of all of the data marts in the organization.
Inmon’s methodology sees the data warehouse as a single, colossal repository for enterprise data stored in a relational model, where Kimball’s methodology sees the data warehouse as a collection of subject area data marts, each stored as a dimensional model.
Although most organizations try to use the Kimbal Dimensional Methodology, most organizations try to either create their own methodology by taking the best of both methodologies to meet their organization’s specific needs or they use a methodology put forth by a software vendor whose tool meets their current data warehousing needs.
Today’s data warehouses play an important role in the success of today’s medium to large businesses because of the timely information they provide to the senior leaders of an organization. Although there are two basic data warehousing methodologies many organizations create their own data warehousing methodology to best suit their organizational needs.
I have created several data warehouses for many organizations in both the public and private sectors. I am hoping to use this blog site as a resource for those entering the field of data warehousing to learn the fundamentals of Data Warehousing as well as providing some tips and tricks for those interested in optimizing their data warehouse. I hope that you find this information informative. I will be posting a new article each Wednesday and I look forward to your feedback and discussion about the weekly topic. Please come back each Wednesday for more information about Data Warehousing and Business Intelligence.