Data Warehousing Best Practices

Designing a Dimensional Data Warehouse – The Basics
November 30, 2016
Data Warehousing Documentation Review
December 21, 2016
There are many times when you completed a task only to say “I wish I would have known that before I started this project” Whether it is fixing the breaks on your car, completing a woodworking project or building a data warehouse, best practices should always be observed to ensure the success of the project. These are seven of the best practices I have observed and implemented over the years when delivering a data warehouse/business intelligence solution. As you will see, most of these are not technical solutions but focus more on the soft skills needed to ensure the success of these long in duration and expensive solutions.

 

Define Standards Before Beginning Design

I liken this practice to the “measure twice, cut once” adage. It is important that all of the documentation and physical deliverables of the project be defined at the outset of the project. Naming standards, documentation standards, coding standards, weekly status reports, release deliverables, etc. should all be defined before the kick-off meeting. Examples of these documents should be a part of the addendum of the presentation, so the customer knows that you are prepared for this project, and they know what to expect at each milestone along the way. This seemingly small step lays the foundation to the overall success of the project from the customer’s point of view.

 

Establish Data Governance Council (if possible)

A recent KPMG survey of CEOs noted that 77% of CEOs said that they had concerns about internal data quality. Such a high number makes me wonder how that 77% of CEOs make their decisions for the success of their company.

The way to address this challenge is to establish a Data Governance Council as a part of the warehousing project. A data governance council can be critical to the success of a data warehousing project. The council is responsible for ensuring data integrity, and quality before the data is ingested into the data warehouse. The members of the council are usually the disparate siloed data experts, data owners and data specialists from the different parts of the organization. This eclectic group of individuals will feel empowered to keep their data clean and accurate because they know the others in the council are doing the same, and they see the positive business results from sharing their data. They will develop policies for data security, sharing and retention. In the end, this group will ensure the data ingested into the warehouse for reporting and analytics is of the highest quality, ensuring your CEO is in the 23% who trust their data to make their business decisions.

 

Make Friends

This is something we forget after leaving kindergarten. We often see the other members of the team, network, storage UNIX/LINUX and Windows engineers, Java, C# and BI developers, and even the customer as obstacles or even worse, enemies. These people, like you, are doing their job to the best of their ability. Most don’t see or understand the business need for a data warehouse; they only see their workload increase and/or their job changing in some way. This is upsetting to most people. It makes them feel disengaged and disrespected and disengaged and disrespected employees have been the ruin of many data warehouse projects. These individuals often appear to be helpful but often leave out critical details needed for the success of the project. To address this challenge, you must work to communicate the value that each member of the team brings to the project. Listen to their opinions, and where possible, include their ideas and, most importantly, give them credit. The establishment of teamwork amongst the team members is important to the success of most projects, but this building of friendships critical to the success of a project as large and long as a data warehousing project.

 

Focus on the Users Needs

The overarching reason for a data warehouse is to provide high quality, trusted information to the users quickly and efficiently. To do this correctly you must focus on the user requirements, not only to deliver what the users specifically requested but to provide them with enhanced capabilities to address the issues that they may not have fully articulated. When you listen to your constituents the results can be astounding; these users will become your best asset.

 

Delivery – Like Domino’s Only Slower (90 Days or Less)

In the 90’s and early 2000’s data warehouses were usually built by spending 6 to 18 months gathering detailed requirements. After all of the requirements were captured the data warehouse team would then go off for another 6 to 24 months and build the warehouse based on those requirements. Meanwhile, the needs of the business changed, and the requirements gathered so many months before are no longer valid when the warehouse is delivered. This was one of the main reasons why so many data warehousing projects failed to meet the user’s expectations. Therefore, we must be able to enhance the design of the data warehouse rapidly to address the changing business needs.

To address this shortfall data warehouse projects started to take on agile project management methodology aspects, where delivery of new and/or enhanced functionality, usually focused on a single subject area, is delivered every 30, 60 or 90 days. This allows the users to receive partial functionality and react to the delivered product. At this point, the users can continue with the schedule as defined or make modifications to the schedule based on this most recently delivered product. This methodology eliminates the long stretches of time between requirements gathering and product delivery and thereby provides the users with the agility to change tact when the business needs change.

 

Remember Performance

We have all heard the expression “speed kills,” well in data warehouses “slow = death.” We live in a fast society where instant coffee is not fast enough; web pages need to load in under 2 seconds, and business users needed information to make decisions yesterday. It is currently estimated that over 2.5 quintillion bytes of data is generated every day, so you must also plan for rapid growth of data stored in the warehouse. Therefore, storage optimization and data insert, update and select performance must be considered when designing a data warehouse and data marts. You must consider all of the performance options the modern databases, ETL tools, and BI/Analytics software provides.

 

Conclusion

Following the above rules will ensure your data warehouse project overcomes the initial inertia of a large project, meets your customer needs in a timeframe for them to react to the changing needs of the business while simultaniously delivering high performing BI reports and analytics.

2 Comments

  1. Raj Sharma says:

    You have written this post very well. Thank you for this share. It will definitely meet the customer satisfaction and their needs.