SQL

November 6, 2019

Leveraging Athena with KNIME in a Robust Manner, Part 2

In my previous blog posting, I introduced an issue we were having with seemingly random intermittent failures using Amazon Web Services’ Athena backed by a large number of data files in S3.   The issue was arising because S3 is eventually consistent and occasionally queries were being executed before their underlying data files were fully materialized in S3. Our solution was to introduce try/catch Knime nodes with a loop to retry failed queries a few times in case of intermittent failures.   To do this we had to do our own flow variable resolution in the Athena SQL queries since the standard […]
October 24, 2019

Leveraging Athena with KNIME in a Robust Manner, Part One

Recently, we began experiencing seemingly random intermittent failures in one of our modeling workflows used in VANE, an advanced predictive analytics project we are building for the US Army.  These failures were occurring with varying frequency inside of any one of several Database SQL Executor nodes.  These nodes were performing a large number of SQL queries against Amazon Web Services’ Athena backed by a large number of data files in S3.   The workflow was randomly failing in any of these nodes with a HIVE SPLIT error due to a missing partition file in S3.  When we investigated the failure, we […]
Contact