Menu Close

How We Built an Award-Winning Data Science Tool In a Single Month

NuWave Solutions participated in the 2020 AFCEA Intelligence & National Security Summit’s EPIC App Challenge last week. The challenge was to use AI to identify an unstructured data set and combine it with a structured data set to produce meaningful, accurate, and compliant results and to show the solution uses its intelligence to: go beyond basic information; work out content automatically; learn from human behavior; and make predictions and forecasts for better insights and intelligent decision making.

OVERVIEW

For our solution, Team NuWave has created a new Strategic Analysis Tool from scratch using open-source data and a Microsoft Azure centric architecture. Our requirements were derived from the daily challenges of the data analyst. As you know, data analysts are drowning in data. This deluge of data is impacting their ability to find and provide accurate information to their superiors in a timely fashion. To address this challenge, we created a solution that can ingest both structured and unstructured news items from the Global Database of Events, Language, and Tone as well as structured data from Google Trends. Our goal was to create an intuitive entity-comparative tool that could be used to quickly make sense of the torrent of data via time-series visualizations. Our analysis tool also dives deep into the data, finding underlying correlations within the cacophony of data that humans would never find. We see this tool as a potential offering to the DoD, Department of State, and Homeland Security, supporting their military or diplomatic missions.

 

MISSION

For this challenge, we built an analytical tool to support Open Source Intelligence (OSINT) on news-based data. While OSINT is an extremely valuable discipline to predictive strategic analysis, the analytical tools are not as well developed as with more traditional INTs. We attempted to produce a tool that helps fill this gap.

This product features a modifiable dashboard presenting descriptive and predictive analytic visualizations. These visualizations compare multiple aspects of media data of two countries centered around a specific search term. Notable aspects of this product are the intuitive side-by-side comparison of descriptive metadata from each country as well as a graphical representation of the top four drivers of the projected sentiment within our predictive visualizations.

 

DATA

We collected two modalities of news media data to power our models:

Global Database of Events, Language and Tone (GDELT) is an open data service featuring unstructured news media data from print and web formats from around the globe, GDELT provided two types of metadata useful to our efforts:

  • Tone: a measure of the average sentiment of all related media documents (3 years)
  • Volume: quantifying each search term as the percent of all worldwide coverage over the selected time period (3 years)

Additionally, we included Google news search trends consists of structured data featuring peak and trough popularity measurements by search term, by country.

 

ARCHITECTURE

We collected the news data via REST services into a SQL DB on Azure, we employed Power BI and the Azure Machine Learning Studio to create descriptive and predictive visualizations to bring meaningful analytical results from a tremendous amount of data. This architecture is agile, allowing for more data inputs based on mission needs.

 

DEMO

We provided a demonstration of the product. In Figure 1., below, you will see the descriptive analytics portion of our product. Here we are able to take a look at the data in a time series so as to assess the changes that have occurred over time. This historical view of the data allows us to see anomalies and trends in the data. This data has limited value because it is a backward-looking analysis. I like to compare it to driving while staring at your rearview mirror. Spend too much time here and you will find yourself in a wreck on the side of the road.

Figure 1. – Descriptive Dashboard

This information is a critical input to the predictive portion of our product. This data, when analyzed by a plethora of CPUs can find patterns and anomalies that individuals just can’t do. To accomplish these patterns and anomalies we have used VAR (Vector Auto-Regressive) algorithm for prediction and Statsmodel Granger Causality algorithm for prediction drivers.

In Figure 2 we see that an Italian protest had a lagging impact on the tone of the United State’s tone concerning nuclear talks compared to that of China. Why might this protest over the cancellation of the Italian football (soccer) season impact the US tone on Nuclear Talks? Maybe because Italy is a member of NATO and is a generally free and open society like the US. This correlation could also be driven by the fact that the soccer season cancellation was driven by a new spike in COVID cases and this could be seen as a precursor to a second wave of new COVID cases in the US, making the Nuclear Talks topic less relevant. An individual would ever notice this correlation between the protest in Europe and the decrease in the tone of Nuclear Talks in the US, but CPUs were able to find this correlation. Please note we are speaking of correlation, not causation. We are not saying that the protest in Italy caused a decrease in Nuclear Talk tone in the US, but we are saying that the two events are connected in some way.

Figure 2. – Predictive Dashboard

DEMO TO PRODUCTION

Our use case for this challenge resonated with our team as we have seen first-hand how hard it can be for analysts to quickly make sense of the onslaught on raw, unstructured data. The ability to break down OSINT into streamlined dashboards in real-time would improve the speed at which analysts can get data-driven insights into the hands of decision-makers.

So how do we get this from a hackathon demo to a real product?

The dashboards are the face of the product and can be dynamically altered to tailor to the needs of the individual analyst. The main thing we would focus on is improving the data layer behind the dashboards to expand the depth and breadth of the possible analyses. We can do this by incorporating more data. Some additional sources we have identified include social media, other open-source data providers, government-provided data, and other authoritative data sources. Again, with more data, we can open up further descriptive views and expand the predictive capabilities of the product, thereby providing the analyst with a more holistic picture of the area of interest.

We would also like to include some further machine learning into the product. For example, including an image processing capability to extract keywords from news images would allow us to further link the news articles to current events. In the day and age where people are scrolling faster than they can read, this type of image-to-meaning translation could provide insights we were previously only getting by analyzing text.

A final capability we would like to incorporate is an automated alerting capability. Once an analyst has configured a dashboard around an area of interest, they will not want to have to check back each and every day to see how things progress. With automated alerts, the analyst would be made aware of any changes (divergence from normal) or major news events as soon as they occur. For example, they could be alerted when the forecasted tone on Nuclear Matters in China is expected to plummet, giving them a further lead time to investigate what could potentially be a major world event.

Those are just a few plans on how this product could further help analysts digest the flood of OSINT data, improving the value of their analysis.

 

CONCLUSION

NuWave Solutions is currently providing Predictive Analytics, Data Science, and Anticipatory Intelligence support for many US defense and IC projects. We see intelligence and data analysts drowning in the tidal wave of data that is preventing them from performing high-value strategic tasks. This tsunami of data is also causing these analysts to miss critical correlations found within the data, putting the mission at a strategic disadvantage. We understand that an intuitive entity-comparative tool such as the one we have demoed here today is needed now more than ever. We believe that today’s analysts’ tools need to be able to quickly identify propaganda and public manipulation, to recognize leading trends and indicators, and have the ability to alert users to the need for action.

 

We believe this analytics and anticipatory intelligence toolset, one that combines descriptive and predictive analytics in an easy to understand visual format, can help the analyst not only stay afloat against the deluge of data, but this solution allows analysts to focus on the actual knowledge work they were originally intended to complete. Work that makes the world a safer and more peaceful place.

 

Finally, I would like to thank the other members of Team NuWave who helped put together such a compelling solution on nights and weekends. Chris Phelan, a new Data Scientist at NuWave, presented the descriptive analytics demo. Raja Chithambaram, a Sr. Data Scientist at NuWave, who showed amazing insights using the predictive analytics engine of our solution. Samantha Hamilton, Data Scientist, who helped define the requirements for the project and describe the next steps we will take to convert this tool from a hackathon demo into a real product. Finally, thanks to Mike Rogers, Data Scientist, who helped with the retrieval and ingestion of the data and for his expertise in defining the requirements that address the needs of today’s data/intelligence analysts.

Posted in Blog

Leave a Reply

Your email address will not be published. Required fields are marked *