Kyle Eckenrode

  • Building Qlik Sense Extensions that can Export and Snapshot
    The Fluff If you’ve ever built an extension in Qlik Sense focused on data visualization, you know how cool it is to harness the power of the Qlik associative engine. It is important for such visualizations to integrate with the user experience and really feel like Qlik—both in style and functionality. Yet many complex extensions suffer functionality drawbacks from failing to overcome two major hurdles: the ability to export to PDF/Image through the right-click menu, and the ability to function as a snapshot within a Qlik Data Story. Though these issues have the same root cause, it proved incredibly difficult for me to track down. It boils down to some undocumented changes in a couple key Qlik JavaScript objects. To see the result of my research and to see instructions on how to handle these issues correctly, skip down to The Stuff below. My path started by trying to “turn on” the export functionality for a table-like extension I was developing. The Qlik help pages helpfully told me to add a property named “export” to […]
  • Client Identification Using Custom Request Headers in Spring Boot
    One of the key building blocks of NuWave’s advanced predictive analytics solutions is a service management platform we built called Dex, which is short for Deus Ex Machina.   Dex was built as collection of microservices using Spring Boot and is responsible for coordinating the execution of complex workflows which take data through acquisition, ingestion, transformation, normalization, and modeling with many different advanced machine learning algorithms.   These processing steps are performed with a large number of technologies (Java, Python, R, Knime, AWS SageMaker, etc.) and are often deployed as Docker containers for execution on either dedicated servers, EC2 instances, or as AWS Fargate tasks.  Dex seamlessly manages the coordination of the execution of all these ephemeral tasks across these various technologies while also providing runtime configuration, secure credential management, state management, and storage provisioning for each executing task.   This has led to the affectionate nickname of “Cat Herder” for this critical technology at the heart of our advanced analytical solutions. With such a complex processing workflow spread across multiple underlying technologies, we faced the challenge of […]
  • Loading Transaction Fact Tables
    This blog post will focus on loading transaction fact tables, subsequent posts for peoiodioc and accumulating snapshots will follow in the coming weeks. Loading fact tables is very different than loading dimensions. First, we need to know the type of fact we are using. The major types of facts are transaction, periodic snapshot, accumulating snapshot and time-span accumulating snapshots. We also need to know the grain of the fact, the dimensional keys which are associated to the measurement event. Let’s say we want to measure product sales of by customer, product, and date attributes. The source transaction system may provide the following associated dimensions: DIM_CUSTOMER, DIM_PRODUCT, and DIM_DATE. Here is an example of a set of product sale transaction records: Product_ID Customer_ID Transaction_Date QTY EXT_Price Discount Total 55 1415 4/10/2019 10:01 3 38.97 5 37.02 72 1415 4/10/2019 10:01 5 39.95 5 37.95 120 1415 4/10/2019 10:01 1 12.99 5 12.34 17 3457 4/10/2019 10:09 3 44.97 0 44.97 57 8322 4/10/2019 10:10 6 35.94 10 32.35 106 1194 4/10/2019 10:17 3 23.97 0 23.97 […]
  • Data Preparation for Machine Learning: Vector Spaces
    Machine learning algorithms often rely on certain assumptions about the data being mined. Ensuring data meet these assumptions can be a significant portion of the preparation work before model training and predicting begins. When I began my data science journey, I was blissfully unaware of this and thought my preparation was done just because I had stuffed everything into a table. Feeding these naïvely compiled tables into learners I wondered why some algorithms never seemed to perform well for me. As I began digging into the algorithms themselves, many referred to vector space operations as being fundamental to their function. Answering the simple question of “what is a vector space” took me on quite an enlightening journey. Vector spaces are really a neat tool for organizing data and require more than getting all your data into a table. Perhaps most mind-bending is that a vector space can be constructed from anything, even if mortals like myself tend to just use numbers and categorical values. I believe part of the reason I was late to understanding […]
  • Oracle Certified Professional Exam Prep
    How I prepared for (and passed) the Oracle Certified Professional exam.
  • Beyond Recommender Systems: Alternative Applications of Matrix Factorization (Part 1)
    For their utility in retail and business systems, Recommender Systems are one of the most popular applications of predictive analytics. A Recommender System is a process that leverages data from the behaviors of individuals and attempts to make personalized predictions of a targeted individual’s future behaviors. Well-known and now obvious applications of a Recommender System include predicting a customer’s preferences for products they may buy so that they can be recommended for purchase. Another well-known and studied application is the prediction of movies that are of interest to an individual based upon their own movie viewing habits and the viewing habits of other individuals that may share some common attitude or trait that indicates some shared interest in specific movies. Though there are several methods for implementing a Recommender System, Collaborative Filtering has become the de facto standard. Collaborative Filtering is an approach to Recommender Systems that is based on a machine learning technique called Matrix Factorization and was made famous within the data science community through its application in the one-million-dollar winning solution for […]
  • Updating Type II Slowly Changing Dimensions
    In this blog post I will provide an example of how I implement Type II dimensions in my data warehouses. The process I go through when refreshing data into a Type II dimension is to first land the data from the source system. I always create a separate schema to hold the data, as it is presented from the source system with no transformations. The tables and columns are named exactly as they are in the source system. The basic idea here is to quickly off-load the data from the source system, thereby eliminating any long running or resource intensive processes from impacting the source application while providing a link back to the source system. Once all of the data is landed, I can now proceed to the transformation and staging step. In this phase, I create staging tables for all of my dimensions. These tables look similar to the dimension table but they do not have a primary key or a majority of audit columns. Here is an example of a regular dimension table […]
  • Introduction to KNIME
    Let’s say you’ve been tasked to pull data from a variety of data sources on a monthly basis, manipulating and processing it to create insights. At first one may look to a powerhouse software like Informatica, but what if you don’t really need something that heavy and your budget is extremely limited? You need an ETL tool to get you from point A to point B accurately and reasonably quickly, something that’s easy to learn but has the depth and flexibility for more advanced processing and customization. Enter KNIME. Up-front KNIME presents itself as a tool designed for data analysts, emphasizing visual clarity and usability over raw power; where Informatica will perform multiple transformation types in a single Expression node, KNIME has a node for each step. This may be a turnoff for some coming from enterprise-level software, but the enhanced control offered by KNIME is extremely valuable and can shave hours off your development and debugging. A major offering of KNIME is that it allows you to see how your data changes at every […]
  • Joining Fact Tables
    Joining fact tables can be done but there are some inherent risks so you need to be careful when joining fact tables is required. In the following simple scenario, we have a fact of authors to articles and a separate fact of articles to pageviews. Our customer has asked for the ability to 1) find the authors who have provided the most content in a given time period, find the articles which have the greatest number of pageviews for a given time period and 3) find the authors with the highest number of pageviews in a given time period. Our data model can be seen here: To answer the first two questions is done by using a simple select statement joining a single fact table with the date dimension and the associated other associated dimensions to retrieve the answer. TOP 10 AUTHORS BY ARTICLE COUNT TOP 10 ARTICLES BY MONTH The third requirement will compel us to join two fact tables to get the answer. Remember to join a fact table you need at least […]
  • Types of Data Models
    A conceptual model is a representation of a system, made of the composition of concepts which are used to help people know, understand, or simulate a subject the model represents.[i] In dimensional modeling this is the most abstract level of modeling. Here we are only worried about the entities (dimensions) and their relationship to each other (facts). No attributes are required in the conceptual model as we are trying to work with the customer who is not an expert in databases or data warehousing but they do understand how the entities are related in their business. The conceptual model will help us to ensure we have captured the full list of dimensions and facts. If we are missing an entity or a relationship the customer will be able to provide us with that information. Now that we have completed a successful conceptual data model, we need to create a logical model. A logical model is a fully attributed conceptual model. The attributes are fully spelled out with no abbreviations. High-level data types (string, number, date, etc.) are […]
  • Data Warehouse Design Techniques – Constraints and Indexes
    In this week’s blog, we will discuss constraints and indexes. In data warehousing, like in life, constraints are things we love to hate. Constraints keep us from making mistakes, which in most cases is a good thing, until we come across an anomaly which needs to be addressed but the constraints prevent this from happening. Most of the time indexes help us to find data faster, but that is not always the case.   What are Indexes? Indexes are data structures which hold field values from the indexed column(s) and pointers to the related record(s). This data structure is then sorted and binary searches are performed to quickly find the record.   When should I Use Indexes? In general, you use indexes when you have a large data set. Small lookup or reference tables are bad candidates for indexes. In most cases the database can more efficiently select all of the information from the table without ever using the index, making the index useless. As a rule of thumb, let the SQL be your guide. […]
  • Data Warehouse Design Techniques – Aggregates
    In this week’s blog, we will discuss how to optimize the performance of your data warehouse by using aggregates. What are Aggregates? Aggregates are the summarization of fact related data for the purpose of improved performance. There are many occasions when the customer wants to be able to quickly answer a question where the data is at a higher grain than we collect. To avoid slow responses due to system summing data up to higher levels we can pre-calculate these requests, storing the summarized data in a separate star. Aggregates can be considered to be conformed fact tables since they must provide the same query results as the detailed fact table. When should I use Aggregates? Aggregates are to be considered for use when the number of detailed records to be processed is large and/or the processing of the customer queries begins to impact the performance of the BI application. Customers of BI applications expect an application which is very responsive to their queries. Aggregates are the way to meet this implied requirement. Aggregate Example […]
  • Data Warehouse Design Techniques – Derived Schemas
    Getting the correct answer is one of the most important requirements of a data warehouse, but this is only part of the requirement. The full requirement is to provide the correct information, to the user at the right time. The information is no good to the user if they need the answer today but you need to write a custom report which will take a week to develop, test and deploy. Although you will be able to provide the correct answer, it will be late and therefore of limited to no value to the decision-making process. How can we address this issues of flexibility and timeliness while building our data warehouse? In this blog article, I will demonstrate how derived schemas can address these challenges. Snapshot Fact & Accumulating Snapshot Fact Tables We have previously discussed these data structures in my previous blog articles titled “Snapshot Fact Tables” and “Accumulating Snapshot Fact Tables”. In these articles, I showed how we improve performance by moving the aggregation and computation of the metrics from the query to […]
  • Loading Hierarchical Bridge Tables
    This blog article is a follow-up to the Ragged Hierarchical Dimensions article I posted a few weeks ago.  In this article, I spoke of using a hierarchical bridge table at a high level, today I will discuss the nuances of this method in more detail.Let’s begin at the dimension table. In order to capture the data with the appropriate level of detail you will need to design your dimension with recursive fields.I have designed this dimension table as a type 2 dimension table and I include the recursive (parent) field as an attribute. This allows me to capture the point-in-time changes of the dimensional attributes while still being able to display the hierarchy. Most reporting tools cannot handle this recursive field, so we need to find a way to transform the recursive join into a relational join. To do this we will use recursive query as a part of the ETL process to create this relational join (bridge) table. The query will create a relational hierarchy consisting of each record in the hierarchy associated with […]
  • Data Warehouse Design Techniques – Accumulating Snapshot Fact Tables
    Today we will look at a different type of fact table, the accumulating fact. Accumulating Fact Tables Accumulating Fact tables are used to show the activity of progress through a well-defined business process and are most often used to research the time between milestones.  These fact tables are updated as the business process unfolds and each milestone is completed.  The most common examples can be found in order and insurance processing. Here a single fact table may contain the important business processes milestones upon which the organization wishes to measure performance. A different version of this same accumulating fact table may include the calculated lag time between the milestone dates as shown below. The lag (number of days) can be calculated from the claim date (total time to complete the milestone) or the lag can be calculated as the number of days since the last milestone was completed (days spent in this stage). The important part is to be consistent in calculating the lag within the snapshot fact. Capturing of this lag data can make […]
  • Data Warehouse Design Techniques – Snapshot Fact Tables
    Last week we defined the three types of fact tables and have established a solid foundation in the definition and use of transactional fact tables. This week we will focus on periodic snapshot fact tables. Snapshot Fact Tables (Periodic) Snapshot fact tables capture the state of the measures based on the occurrence of a status event or at a specified point-in-time or over specified time intervals (week, month, quarter, year, etc.). The Snapshot Fact table and the Transactional fact table can be very similar in design. In the design above the sales are rolled up to the month in the Periodic Snapshot fact table. In this table, you are unable to drill down past the month, as the month is the lowest grain of this Periodic Snapshot fact table. Periodic Snapshot tables are also important when measuring those metrics which may not have a corresponding transaction, such as air temperature. We cannot capture transactions causing the air temperature to rise or fall but we can measure the temperature at predetermined points in time. As you […]
  • Data Warehouse Design Techniques – Fact Tables
    Now that we have established a solid foundation with dimension tables we will now turn our focus to the fact tables. Fact tables are data structures which capture the measurements of a particular business process. The measurements (quantity, amount, etc.) are defined by the collection of related dimensions. This collection of dimensional keys is called the grain of the fact. Types of Fact Tables The three basic fact table grains are the transactional, the periodic snapshot and the accumulating snapshot. Transactional fact tables are the most common fact in dimensional modeling. Transactional fact tables capture the measurement at its most atomic dimensional level, at the point-in-time of the transaction. This allows the fact table to provide robust dimensional grouping and roll-up and drill-down reporting capabilities to the business user. Periodic Snapshot fact tables capture the state of the metrics at a specified point-in-time. Periodic Snapshot Fact tables are used to quickly assess and review the performance of the measures over specified time intervals (week, month, quarter, year, etc.). As you can see the Periodic Snapshot […]
  • Data Warehouse Design Techniques – Bridge Tables
    There will be times when you arrive at a many to many relationship between a fact and a dimension. This many to many relationship is in violation of the dimensional modeling rules, so how do you handle this multi-valued relationship?  Bridge tables. In this blog post I will take you through the importance and usefulness of bridge tables and how and when to use these data structures. Multi-Valued Dimensions Multi-valued dimensions are more common than most people think and are often “refactored” to a single valued dimension. Coming from a publishing background, sales of articles come to mind as a multi-valued dimension. The sale of articles must have one author but could very easily have two or more authors. How do you handle this in the warehouse? You could define one author as the primary author or you could create a really long field and place all of the authors of the article in this field. But how can you find the total sales for all of the articles written by a single individual who […]
  • Data Warehouse Design Techniques – Ragged Hierarchical Dimensions
    In my last post, we discussed the creation of simple hierarchical dimensions. In this post, I will discuss how to handle those ragged hierarchies, those which can skip levels in the hierarchy. Snowflaking There may be instances where you wish to normalize the hierarchy into a snowflake. Below is a city, state/province, country hierarchy which can easily be converted into a snowflake model. Here we would simply take the list of all of the countries and place them into a country dimension, the state/provinces into a state/province table and finally a city into a city table. While loading these dimensions, you must remember to carry the parent and grandparent IDs along. Therefore, the city dimension will reference the state table where appropriate and the country table (always) while the state/province table will only reference the country dimension. Flatten Hierarchy One way to handle the ragged hierarchy is to flatten the hierarchy just as we did in the simple dimensional hierarchy and copy the grandparent level data down to the parent level. This copying of this […]
  • Data Warehouse Design Techniques – Simple Hierarchical Dimensions
    Today, continuing through our look at the advanced modeling techniques of dimensional data warehouses, we will take a look at simple hierarchical dimensions. Simple Hierarchical Dimensions Hierarchical dimensions are those dimensions which have a parent/child relationship. In simple hierarchies there every child has a parent at the level above with no skipping of levels. A date (day, month, quarter, year) is the most common example of a simple hierarchical dimension and one that is used by all dimensional data warehouses. The question for the data modeler is how to model the hierarchy? Let’s use the date dimension for some initial examples. The first and most common way to model the date hierarchy is to flatten the date into a single dimension where you have every piece of information you need concerning the date in a single location. In this flattened dimensional hierarchy, you can see all of the parent, grandparent, great-grandparent, etc. information of the date in a single table. The selecting of a date will provide you with the detailed information about the date […]
  • Data Warehouse Design Techniques – Snowflaking
    We will continue our deep dive through advanced dimensional data warehouse design techniques by discussing snowflaking. Snowflaking Dimension Tables According to Dr. Kimball, dimensions should be designed as long, denormalized records. The reason for denormalization is to ensure maximum performance of the DW/BI queries. Snowflaking is the normalization of the dimension tables by removing all redundant attributes into separate dimension tables. The reason for normalizing is often the desire to “enhance” the design and reduce the storage costs of the warehouse. Although this normalized design appears to optimize the storage and maintenance of the warehouse it actually introduces maintenance complexities while introducing sub-optimal query performance issues for DW/BI queries. Reporting Using Snowflakes As I demonstrated in a previous blog, Type 2 Slowly Changing Dimensions (SCD) provide the most flexibility for BI reporting because of the ability to report on historical data, including point-in-time reports. Snowflaking can sometimes hide the ability to see these changes the snowflake dimensions. Normalization, by definition, does not capture the transactional details of the modifications of the data. This lack of […]
  • Data Warehouse Design Techniques – Degenerate & Junk Dimensions
    Today we will look at the advanced dimensional data warehouse design techniques of Degenerate and Junk Dimensions. Degenerate Dimension Tables As I stated in an earlier blog post, degenerate dimensions are not physically implemented data structures. Degenerate dimension attributes exist in the fact table as a part of the primary key but have no corresponding dimension. Let’s look at the most common example of a degenerate dimension, the invoice. Invoices contain a lot of information which can be complex and sometimes difficult to model. All invoices contain information about products and customers, but some contain additional information like shipping information and product location (warehouse) information. Starting with a conceptual the model for an invoice, our model may look like this: Some may want to try to create an invoice dimension to capture the invoice information in a single dimension as can be seen in this logical model: While this model accurately depicts the invoice transaction there are several issues with this model. The greatest among these issues would be that the invoice dimension can be […]
  • Data Warehouse Design Techniques – Rapidly Changing Dimensions
    In my last blog post, I showed how to capture changes for slowly changing dimensions. Today we will see how to handle rapidly changing dimensions. Rapidly Changing Dimensions Rapidly changing dimensions are dimensions where the attribute values of the dimension change frequently causing the dimension grow rapidly if you a have designed the dimension to capture the changes as a Type 2 dimension. The rapid growth of this dimension will impact maintenance and performance as the dimension grows. Let’s take a look at solutions to this challenge. Mini-Dimension (SCD Type 4) Mini-dimensions contain the rapidly changing attributes of the original dimension and are treated as a stand-alone dimension. This dimension will band the attributes together instead of treating them as individual pieces of data. The fact table can then connect to the mini-dimension to based upon the data provided or this mini-dimension could be connected to the “parent” dimension using the surrogate key as a Type 1 or Type 2 SCD attribute. An example of this can be seen in a customer table where the […]
  • Data Warehouse Design Techniques – Slowly Changing Dimensions
    In my last blog post, I demonstrated the importance of conformed dimensions to the flexibility and scalability of the warehouse. This week we will discuss the importance of capturing the dimensional change in slowly changing dimensions. Slowly Changing Dimensions Slowly Changing Dimensions (SCD) are the most commonly used advanced dimensional technique used in dimensional data warehouses. Slowly changing dimensions are used when you wish to capture the changing data within the dimension over time. There are three methodologies for slowly changing dimensions.   Type 1 – For this type of slowly changing dimension you simply overwrite the existing data values with new data values. This makes the updating of the dimension easy and limits the growth of the dimension table to only new records. The drawback of this is you lose the historical value of the data because the dimension will always contain the current values for each attribute. For instance, you have a store dimension which has an attribute for a geographic region. If there is a redesign in the regional boundaries some stores may move […]
  • Data Warehouse Design Techniques – Conformed Dimensions
    In my last blog post, I presented to you some of the more widely used advanced technical dimensional design techniques. Over the next several of weeks, I will provide detail to each of the design techniques. Conformed Dimensions Conformed dimensions are those dimensions which have been designed in such a way that the dimension can be used across many fact tables in different subject areas of the warehouse. It is imperative that the designer plan for these dimensions as they will provide reporting consistency across subject areas and reduce the development costs of those subject areas via reuse of existing dimensions. The date dimension is an excellent example of a conformed dimension. Most warehouses only have a single date dimension used throughout the warehouse. Conformed Dimension – Challenges Structural Differences – in some cases you will have source tables which contain different table structures These may include: additional or missing columns columns with different data types columns named differently containing the same or similar data These differences can usually be resolved easily but occasionally this […]
  • Advanced Dimensional Data Warehouse Design Techniques
    In an earlier blog post, I walked you through the basics of dimensional data warehouse design by introducing you to dimension tables, fact tables and star schema design. For this blog post, I will introduce some of the more widely used advanced dimensional design techniques which I will describe in detail in the coming weeks. DIMENSIONS Conformed Dimensions Conformed dimensions are those dimensions which have been designed in such a way that the dimension can be used across many fact tables in different subject areas of the warehouse.   Slowly Changing Dimensions Slowly changing dimensions are the second most commonly used advanced dimensional technique used in dimensional data warehouses. Slowly changing dimensions are used when you wish to capture the changing data within the dimension over time.   Hierarchical Dimensions Hierarchical dimensions are those dimensions which have an associated hierarchy. A date (day, month, quarter, year) is an example of a hierarchical dimension and one that is often used by many data modelers and data warehouses.   Snowflake Dimension Tables The removing all redundant dimensional […]
  • Data Warehousing Documentation Review
    One of the most important artifacts delivered as a part of any project is documentation. Documentation is the bane of IT development teams. Most developers feel “if it was hard to write, it should be even harder to understand and modify.” When developing and delivering a data warehouse documentation is critical to the success of the project. This documentation will help both the business users and the technical teams understand the source, the transformation and storage of the data they need to consume. These documents are the foundation upon which the warehouse will be built.   Business Requirements Document (BRD) This is a formal document which requires review and approval. The BRD captures problem the organization is trying to solve and any restrictions on that need to be considered. Essentially, the BRD defines the business user’s high-level specifications (scope) on what they are hoping to achieve from the data warehouse and delivered through a BI reporting or analytic tool along with any assumptions, constraints and risks which impact or may impact the project. This document […]
  • Data Warehousing Best Practices
    There are many times when you completed a task only to say “I wish I would have known that before I started this project” Whether it is fixing the breaks on your car, completing a woodworking project or building a data warehouse, best practices should always be observed to ensure the success of the project. These are seven of the best practices I have observed and implemented over the years when delivering a data warehouse/business intelligence solution. As you will see, most of these are not technical solutions but focus more on the soft skills needed to ensure the success of these long in duration and expensive solutions.   Define Standards Before Beginning Design I liken this practice to the “measure twice, cut once” adage. It is important that all of the documentation and physical deliverables of the project be defined at the outset of the project. Naming standards, documentation standards, coding standards, weekly status reports, release deliverables, etc. should all be defined before the kick-off meeting. Examples of these documents should be a part […]
  • Designing a Dimensional Data Warehouse – The Basics
    Last week I wrote about Relational (Atomic) Data Warehouses and how to create these data structures. This week we will look at Dimensional Data Warehouses and how they differ from the Relational Data Warehouse. Creating a dimensional data warehouse is very different from creating a relational data warehouse. In the relational data warehouse, you are looking to store every atomic element which the organization has deemed to be important from the source applications and storing this data in a relational structure. Relational data structures are optimized for transactional speed and the limiting the use of disk space which is why this methodology is so prevalent in transaction-based systems. Dimensional models are specifically designed for optimized for the delivery of reports and analytics.   Dimensions – What are they? Dimensional Data Warehouses must have dimensions, right? So what are dimensions? Think of dimensions as containers of adjectives which describe the dimension and will help us limit the fact information we wish to see. Dimension tables can contain a lot of columns if the dimension is robust […]
  • Designing a Relational Data Warehouse
    The use of relational modeling is prevalent in today’s business landscape as most applications are built using relational databases. Relational modeling is based on mathematical set theory and was first described in 1969 by E.F. Codd. He stated that a grouping of elements (tuples) could be related to other tuples in such a way so as to limit the amount of data stored for a specific tuple and therefore optimize the storage of such data. This sounds difficult! However, it is easy. Let’s look at a real world example. Take a look at the information collected about John Smith below. John Quincy Smith DOB: 12/31/1969 National ID: 123-45-6789 1234 Main Street, Anytown, VA 20001 (h) 8000 Corporate Drive, Suite 1022, Sombertown, VA 20101 USA (w) jsmith@company.com (w) john.smith@gmail.com (p) jqsmith@yahoo.com (p) (703) 555-1212 (w) (703) 212-1555 (h) (571) 234-9876 (c) 1234-5678-9012-3456 (Visa) Could all of this information be stored in a single table? Yes, absolutely. However, storing data in a single table is inefficient. In this example, we added a second personal email address column. […]
  • Data Warehouse Design Methodologies
    There are two traditional data warehouse design methodologies came of age in the 1990’s, that of Bill Inmon’s Top-Down Atomic Data Warehouse and that of Ralph Kimball’s Bottom-Up Dimensional Data Warehouse. These methodologies have been used over the past 20 years to create informational data stores for organizations seeking to leverage their data for corporate gain. In this article, we will compare and contrast these two methodologies.   Atomic Data Warehouse – Bill Inmon Bill Inmon’s Atomic Data Warehouse approach is strategic in nature and seeks to capture all of the enterprise data in 3rd Normal Form and store all of this atomic data in the data warehouse. Inmon defines a data warehouse as a subject-oriented, non-volatile, time-variant and integrated data source. Let’s break down each of these descriptors of the Inmon’s Data Warehouse. Subject-Oriented – the data is organized so that the data, related by subject area, is linked together. Non-Volatile – once data is entered it is never updated or deleted; all data is retained for future reporting needs. Time-Variant – because of […]
  • Requirements for a Successful Data Warehouse Project
    In order to have a successful data warehousing project you will need the following five challenges addressed: #1. Executive Sponsorship All projects need executive sponsorship. Without senior executive leadership, most corporate projects will likely fail. Data warehouse projects are no different, in fact, these projects are even more in need of executive sponsorship than most business initiatives. It is imperative that the organization’s corporate leaders infuse the enterprise with excitement concerning the success of a data warehousing project. These executives must also tie the success of the project to each and every one of their direct reports. This will ensure that all of the business units of the organization are linked to the success of the Data Warehousing project. Organizational commitment from the top down will eliminate data fiefdoms, those individuals/departments who refuse to share their data or information with the larger organization for fear of loss of power or influence. These fiefdoms often work subtly to sabotage the success of the warehouse project. They do this out of fear, afraid that sharing their information […]
  • Data Warehousing – The Basics
    What Are Data Warehouses? A data warehouse is a large, central store of data from disparate source systems across the enterprise optimized for analysis and reporting. The data is extracted from the source transaction based application databases, transformed into new data structures through the applying a set of rules and functions against the extracted data before loading the transformed data into the data warehouse data structure which is optimized for reporting. This central data store provides the business user with a snapshot of the business at a point-in-time. The data warehouse is used in making both strategic and tactical business decisions.   Why Are Data Warehouses Important? Data warehouses hold large quantities of source system data over long periods of time. This time-variant data contained in the data warehouses provide the foundation upon which historical analytical and reporting applications are built upon. These applications provide the organization’s senior leadership with a quick and holistic view of the success of the organization over time. Data warehouses can also be used and the source data used for […]
  • Introducing Vilocity 2.0….
    By: Ryan Legge For those of you who missed our release on Vilocity 2.0, you can read it here.  As I am sure you know from having read about this revolutionary solution, Vilocity uses an open source framework to combine two powerful COTS BI/Data Discovery products for data analysis and reporting: Oracle Business Intelligence Enterprise Edition (OBIEE) and Oracle Endeca Information Discovery (OEID). This framework combines these two powerful tools into a single user experience, allowing users to seamlessly navigate from Business Intelligence to Information Discovery. Several key features of Vilocity include: A common user interface across both OBIEE and OEID The ability to filter and navigate both OBIEE and OEID components seamlessly utilizing OEID’s Guided Navigation A true personalized experience that allows users to combine both OBIEE and OEID components into a single dashboard We could talk all day about the benefits of utilizing the Vilocity framework to provide a truly unified Analytics capability. But for this post, we want to talk about one of the new features of Vilocity 2.0, the unique Portlet […]
  • Introduction to Endeca Studio 3.1
    By: Matthew T. Farrington The Oracle Endeca Information Discovery (OEID) platform provides an incredibly powerful, intuitive, and fast means of digging through complex data-sets of varied structures. All of us at NuWave Solutions celebrated when we received our Oracle Partner Network (OPN) status specializing in Oracle Endeca Information Discovery (OEID) on December 2013. The best way to see the capabilities of software is to build a simple application and start analyzing some data. Fortunately, you don’t need a team of developers, database exerts, or front-end designers to construct a robust data discovery application. Endeca Studio 3.1 comes with a simple application builder tool that takes a given data-set and produces a simple application which you can subsequently customize to your desire. This tutorial will walk you through application creation and component configuration, starting you down the path of data discovery. To quickly become familiar with Endeca Studio 3.1 we are going to create a new application that leverages Chicago crime statistics to examine why so many police reports do not result in an arrest. For the […]

March 6, 2019

Introduction to KNIME

Let’s say you’ve been tasked to pull data from a variety of data sources on a monthly basis, manipulating and processing it to create insights. At first one may look to a powerhouse software like Informatica, but what if you don’t really need something that heavy and your budget is extremely limited? You need an ETL tool to get you from point A to point B accurately and reasonably quickly, something that’s easy to learn but has the depth and flexibility for more advanced processing and customization. Enter KNIME. Up-front KNIME presents itself as a tool designed for data analysts, emphasizing visual clarity and usability over raw power; where Informatica will perform multiple transformation types in a single Expression node, KNIME has a node for each step. This may be a turnoff for some coming from enterprise-level software, but the enhanced control offered by KNIME is extremely valuable and can shave hours off your development and debugging. A major offering of KNIME is that it allows you to see how your data changes at every […]
Contact