Blog

May 28, 2019

Loading Accumulating Snapshot Fact Tables

Often management looks for bottlenecks in corporate processes so that they can be streamlined or used as a measurement of success for the organization. In order to do achieve these goals we need to measure time between two or more related events. The easiest way to report on this time-series process is to use accumulating snapshot facts.  Accumulating snapshot facts are updatable fact records used to measure time between two or more related events. The most common example of this type of fact can be seen in order processing. Let’s take a look! Order processing consists on many serialized processes. […]
May 15, 2019

Building Qlik Sense Extensions that can Export and Snapshot

The Fluff If you’ve ever built an extension in Qlik Sense focused on data visualization, you know how cool it is to harness the power of the Qlik associative engine. It is important for such visualizations to integrate with the user experience and really feel like Qlik—both in style and functionality. Yet many complex extensions suffer functionality drawbacks from failing to overcome two major hurdles: the ability to export to PDF/Image through the right-click menu, and the ability to function as a snapshot within a Qlik Data Story. Though these issues have the same root cause, it proved incredibly difficult […]
May 1, 2019

Client Identification Using Custom Request Headers in Spring Boot

One of the key building blocks of NuWave’s advanced predictive analytics solutions is a service management platform we built called Dex, which is short for Deus Ex Machina.   Dex was built as collection of microservices using Spring Boot and is responsible for coordinating the execution of complex workflows which take data through acquisition, ingestion, transformation, normalization, and modeling with many different advanced machine learning algorithms.   These processing steps are performed with a large number of technologies (Java, Python, R, KNIME, AWS SageMaker, etc.) and are often deployed as Docker containers for execution on either dedicated servers, EC2 instances, or as […]
April 24, 2019

Loading Transaction Fact Tables

This blog post will focus on loading transaction fact tables, subsequent posts for peoiodioc and accumulating snapshots will follow in the coming weeks. Loading fact tables is very different than loading dimensions. First, we need to know the type of fact we are using. The major types of facts are transaction, periodic snapshot, accumulating snapshot and time-span accumulating snapshots. We also need to know the grain of the fact, the dimensional keys which are associated to the measurement event. Let’s say we want to measure product sales of by customer, product, and date attributes. The source transaction system may provide […]
April 12, 2019

Data Preparation for Machine Learning: Vector Spaces

Machine learning algorithms often rely on certain assumptions about the data being mined. Ensuring data meet these assumptions can be a significant portion of the preparation work before model training and predicting begins. When I began my data science journey, I was blissfully unaware of this and thought my preparation was done just because I had stuffed everything into a table. Feeding these naïvely compiled tables into learners I wondered why some algorithms never seemed to perform well for me. As I began digging into the algorithms themselves, many referred to vector space operations as being fundamental to their function. […]
April 4, 2019

Oracle Certified Professional Exam Prep

How I prepared for (and passed) the Oracle Certified Professional exam.
March 20, 2019

Updating Type II Slowly Changing Dimensions

In this blog post I will provide an example of how I implement Type II dimensions in my data warehouses. The process I go through when refreshing data into a Type II dimension is to first land the data from the source system. I always create a separate schema to hold the data, as it is presented from the source system with no transformations. The tables and columns are named exactly as they are in the source system. The basic idea here is to quickly off-load the data from the source system, thereby eliminating any long running or resource intensive […]
March 6, 2019

Introduction to KNIME

Let’s say you’ve been tasked to pull data from a variety of data sources on a monthly basis, manipulating and processing it to create insights. At first one may look to a powerhouse software like Informatica, but what if you don’t really need something that heavy and your budget is extremely limited? You need an ETL tool to get you from point A to point B accurately and reasonably quickly, something that’s easy to learn but has the depth and flexibility for more advanced processing and customization. Enter KNIME. Up-front KNIME presents itself as a tool designed for data analysts, […]
February 20, 2019

Joining Fact Tables

Joining fact tables can be done but there are some inherent risks so you need to be careful when joining fact tables is required. In the following simple scenario, we have a fact of authors to articles and a separate fact of articles to pageviews. Our customer has asked for the ability to 1) find the authors who have provided the most content in a given time period, find the articles which have the greatest number of pageviews for a given time period and 3) find the authors with the highest number of pageviews in a given time period. Our […]
January 23, 2019

Types of Data Models

A conceptual model is a representation of a system, made of the composition of concepts which are used to help people know, understand, or simulate a subject the model represents.[i] In dimensional modeling this is the most abstract level of modeling. Here we are only worried about the entities (dimensions) and their relationship to each other (facts). No attributes are required in the conceptual model as we are trying to work with the customer who is not an expert in databases or data warehousing but they do understand how the entities are related in their business. The conceptual model will help us […]
January 19, 2018
McLean, Virginia — 1-18-2018 — NuWave Solutions, a leader and innovator in Analytics and Data Management solutions, was selected to present a webcast on Self-Service Data Science at the Qlik Federal Tech Tuesday webcast on 23 January 2018. Brian Frutchey, Vice President at NuWave, will speak about how powerful predictive analytics can be made accessible to the “common man” by using technologies from our partners Celect and Qlik together. “NuWave has deep experience in self-service machine learning from Celect and self-service data exploration platform from Qlik, and has begun using them together for a variety of US Public Sector customers […]
December 5, 2017

NuWave Solutions Certified as a Great Workplace

McLean, Virginia — 11-13-2017 — NuWave was certified as a great workplace today by the independent analysts at Great Place to Work®. NuWave earned this credential based on extensive ratings provided by its employees in anonymous surveys. A summary of these ratings can be found at http://reviews.greatplacetowork.com/nuwave-solutions-llc. “We are honored to have received this recognition,” says Rob Castle, Chief Technology Officer at NuWave. “It’s gratifying to know how highly our employees feel about their workplace. Our workforce is the lifeblood of our company, and the foundation for success for ourselves and our customers.” “We are thrilled by this certification,” says […]
July 28, 2017

Data Warehouse Design Techniques – Constraints and Indexes

In this week’s blog, we will discuss constraints and indexes. In data warehousing, like in life, constraints are things we love to hate. Constraints keep us from making mistakes, which in most cases is a good thing, until we come across an anomaly which needs to be addressed but the constraints prevent this from happening. Most of the time indexes help us to find data faster, but that is not always the case.   What are Indexes? Indexes are data structures which hold field values from the indexed column(s) and pointers to the related record(s). This data structure is then […]
July 25, 2017

NuWave Solutions Presenting Machine Learning on Dirty Data at the Department of Defense Intelligence Information System (DoDIIS) Conference in St. Louis

Brian Frutchey, a Vice President at NuWave, will speak about Machine Learning on Dirty Data. MCLEAN, VA, July 25, 2017 /24-7PressRelease/ — NuWave has accepted the opportunity to be among the group of presenters at the DoDIIS Conference in St. Louis from August 13 – 16, 2017. Brian Frutchey, a Vice President at NuWave, will speak about Machine Learning on Dirty Data. Machine learning is difficult for the DoD and Intelligence Community because their data often contains gaps and other irregularities, reducing the questions which can be addressed and contributing to the need for implementation by specialists. NuWave has solutions that uniquely […]
July 5, 2017

Data Warehouse Design Techniques – Aggregates

In this week’s blog, we will discuss how to optimize the performance of your data warehouse by using aggregates. What are Aggregates? Aggregates are the summarization of fact related data for the purpose of improved performance. There are many occasions when the customer wants to be able to quickly answer a question where the data is at a higher grain than we collect. To avoid slow responses due to system summing data up to higher levels we can pre-calculate these requests, storing the summarized data in a separate star. Aggregates can be considered to be conformed fact tables since they […]
June 16, 2017

Data Warehouse Design Techniques – Derived Schemas

Getting the correct answer is one of the most important requirements of a data warehouse, but this is only part of the requirement. The full requirement is to provide the correct information, to the user at the right time. The information is no good to the user if they need the answer today but you need to write a custom report which will take a week to develop, test and deploy. Although you will be able to provide the correct answer, it will be late and therefore of limited to no value to the decision-making process. How can we address […]
May 24, 2017

Loading Hierarchical Bridge Tables

This blog article is a follow-up to the Ragged Hierarchical Dimensions article I posted a few weeks ago.  In this article, I spoke of using a hierarchical bridge table at a high level, today I will discuss the nuances of this method in more detail.Let’s begin at the dimension table. In order to capture the data with the appropriate level of detail you will need to design your dimension with recursive fields.I have designed this dimension table as a type 2 dimension table and I include the recursive (parent) field as an attribute. This allows me to capture the point-in-time […]
April 21, 2017

Data Warehouse Design Techniques – Accumulating Snapshot Fact Tables

Today we will look at a different type of fact table, the accumulating fact. Accumulating Fact Tables Accumulating Fact tables are used to show the activity of progress through a well-defined business process and are most often used to research the time between milestones.  These fact tables are updated as the business process unfolds and each milestone is completed.  The most common examples can be found in order and insurance processing. Here a single fact table may contain the important business processes milestones upon which the organization wishes to measure performance. A different version of this same accumulating fact table […]
April 5, 2017

Data Warehouse Design Techniques – Snapshot Fact Tables

Last week we defined the three types of fact tables and have established a solid foundation in the definition and use of transactional fact tables. This week we will focus on periodic snapshot fact tables. Snapshot Fact Tables (Periodic) Snapshot fact tables capture the state of the measures based on the occurrence of a status event or at a specified point-in-time or over specified time intervals (week, month, quarter, year, etc.). The Snapshot Fact table and the Transactional fact table can be very similar in design. In the design above the sales are rolled up to the month in the […]
March 29, 2017

Data Warehouse Design Techniques – Fact Tables

Now that we have established a solid foundation with dimension tables we will now turn our focus to the fact tables. Fact tables are data structures which capture the measurements of a particular business process. The measurements (quantity, amount, etc.) are defined by the collection of related dimensions. This collection of dimensional keys is called the grain of the fact. Types of Fact Tables The three basic fact table grains are the transactional, the periodic snapshot and the accumulating snapshot. Transactional fact tables are the most common fact in dimensional modeling. Transactional fact tables capture the measurement at its most […]
Contact