Paul Wisneskey

February 12, 2020

Making the Pass, Part 2: Training a Neural Network with KNIME

In my previous blog posting, I started to explore what it would take to teach a machine learning model how to throw a virtual football to a moving target. If you have not read that article, please go do so: it covers the definition of the problem we are trying to model and how I used KNIME’s parameter optimization nodes to generate the parameters for completing a pass for an arbitrary set of starting conditions. Now that we have figured out how to make a single pass give a set of starting conditions, the first step for this posting was […]
January 29, 2020

Making the Pass, Part 1: Parameter Optimization with KNIME

With the Super Bowl just around the corner here in the United States, I recently found myself trying to think of a football-themed blog posting.  Based on the football theme, I decided to create a post about a fundamental aspect of the game: throwing the ball to a receiver (a “pass” in American football parlance.)  It is a simple operation that, with a little bit of practice, most people can perform intuitively without any knowledge of the physics equations that govern the projectile motion of the ball and the linear motion of the receiver.  Maybe it would be possible to […]
January 15, 2020

iTunes Library Cleanup: XML and String Distances in KNIME

I have been a full-time telecommuter over 18 years so when I am not on a conference call, I am usually listening to music while I work. Over this span of my career I have accumulated a large music library which I manage in iTunes and stream to various devices around my house. As my collection has grown, I’ve tried to be organized and diligent about keeping the meta-data of my music cleaned up. But it has inevitably gotten a messy over time, particularly since my children have hit their teenage years and started to add their own music to […]
December 12, 2019

Let It Snow: Generating Fractal Snowflakes with KNIME

Continuing the holiday theme of my previous blog entry, today I decided to write a quick post on how to generate fractal snowflakes using KNIME, the industry leading open source data science platform.   While I’d be comfortable coding this in any number of programming languages, the very nature of KNIME’s desktop application makes it simple to rapidly explore and visualize data in a way that lends itself to fun mathematical explorations for even non-programmers.   However, as with many data science tasks, in the midst of my explorations today I found myself confronted with anomalous results that I initially couldn’t explain.  […]
November 25, 2019

Baking an Approximate Pi with KNIME Using a Monte Carlo Recipe

With the holiday season in the United States rapidly approaching, I thought I’d have some fun with this blog posting.  So rather than giving you a typical blog entry highlighting some specific cool pieces of the awesome open-source KNIME end to end data science platform, today I am going to teach you how to bake an Approximate Pi with KNIME using a Monte Carlo recipe.   And along the way I will show you some interesting cooking tools and techniques to add to your KNIME kitchen. What is an Approximate Pi you ask?  It is simply an estimate of the irrational […]
November 6, 2019

Leveraging Athena with KNIME in a Robust Manner, Part 2

In my previous blog posting, I introduced an issue we were having with seemingly random intermittent failures using Amazon Web Services’ Athena backed by a large number of data files in S3.   The issue was arising because S3 is eventually consistent and occasionally queries were being executed before their underlying data files were fully materialized in S3. Our solution was to introduce try/catch Knime nodes with a loop to retry failed queries a few times in case of intermittent failures.   To do this we had to do our own flow variable resolution in the Athena SQL queries since the standard […]
October 24, 2019

Leveraging Athena with KNIME in a Robust Manner, Part One

Recently, we began experiencing seemingly random intermittent failures in one of our modeling workflows used in VANE, an advanced predictive analytics project we are building for the US Army.  These failures were occurring with varying frequency inside of any one of several Database SQL Executor nodes.  These nodes were performing a large number of SQL queries against Amazon Web Services’ Athena backed by a large number of data files in S3.   The workflow was randomly failing in any of these nodes with a HIVE SPLIT error due to a missing partition file in S3.  When we investigated the failure, we […]
August 6, 2019

Batch Execution of KNIME Workflows, Part 2

In an earlier blog posting I walked through the steps required to install KNIME in a Docker container so that it could be used to run workflows in a batch mode.  This is a very useful technique for automating workflow execution and leveraging cloud resources to do so.  However, most workflows are not self-contained: they need access to configuration, external data, and local storage.   I did not cover those aspects in my first posting so this blog entry will introduce ways to pass in configuration and support the batch execution of KNIME workflows in a container. Setting Flow Variables The […]
August 1, 2019

Automated Execution of Multiple KNIME Workflows

When using the open source KNIME Analytics Platform to build sophisticated data processing and analysis pipelines, I often find myself building workflows that consist of so many nodes they become difficult to manage.  This complexity can be managed by grouping nodes into processing stages and then bundling those stages into meta-nodes so that the overall workflow layout is easier to follow. However, I’ve found that this approach still leaves workflows unwieldy to work with as you still have to open the meta-nodes to explore and troubleshoot their processing.  Over the years I’ve worked with KNIME, I’ve developed a habit of […]
July 30, 2019

Batch Execution of KNIME Workflows Using Docker

In my previous blog posting I introduced our service management platform Dex which is at the core of many of our advanced analytics solutions.  Dex seamlessly manages the coordination of the data acquisition, ingestion, transformation and modeling for our system via processing steps that are often deployed as Docker containers for execution on either dedicated servers, EC2 instances, or as AWS Fargate tasks.  One of the core applications we leverage for many of these steps is the open source KNIME analytics platform (www.knime.org).   KNIME’s easy to use GUI allows us to quickly develop and test ETL and modeling workflows.  Once […]
May 1, 2019

Client Identification Using Custom Request Headers in Spring Boot

One of the key building blocks of NuWave’s advanced predictive analytics solutions is a service management platform we built called Dex, which is short for Deus Ex Machina.   Dex was built as collection of microservices using Spring Boot and is responsible for coordinating the execution of complex workflows which take data through acquisition, ingestion, transformation, normalization, and modeling with many different advanced machine learning algorithms.   These processing steps are performed with a large number of technologies (Java, Python, R, KNIME, AWS SageMaker, etc.) and are often deployed as Docker containers for execution on either dedicated servers, EC2 instances, or as […]
Contact