By: Matthew T. Farrington
The Oracle Endeca Information Discovery (OEID) platform provides an incredibly powerful, intuitive, and fast means of digging through complex data-sets of varied structures. All of us at NuWave Solutions celebrated when we received our Oracle Partner Network (OPN) status specializing in Oracle Endeca Information Discovery (OEID) on December 2013.
The best way to see the capabilities of software is to build a simple application and start analyzing some data. Fortunately, you don’t need a team of developers, database exerts, or front-end designers to construct a robust data discovery application. Endeca Studio 3.1 comes with a simple application builder tool that takes a given data-set and produces a simple application which you can subsequently customize to your desire. This tutorial will walk you through application creation and component configuration, starting you down the path of data discovery.
To quickly become familiar with Endeca Studio 3.1 we are going to create a new application that leverages Chicago crime statistics to examine why so many police reports do not result in an arrest. For the sake of simplicity, we are going to utilize a publicly hosted Endeca Studio server to host the new application.
Please note that the data-set used in this demo is real-time crime data from the City of Chicago. As you work thought this demo you may see different patterns emerge as the raw data used is continually changing.
Endeca 3.1 will automatically generate and populate a Studio application, complete with a search box, available refinements, breadcrumbs, and several charts, when a data source is selected. To start investigation why so many police reports do not result in an arrests we will start by building an application for the Chicago crime data set.
1. Access the public Studio Server at http://slc02oku.oracle.com:7101/eid/web/ and login
a. Username: firstname.lastname@example.org
b. Password: Admin123
2. Select “New Application.”
3. Give the application a name then select “Use a Pre-Built Endeca Server” in the lower right. In the “Select a Managed Data Connection” drop-down select “Public Security.” Endeca Studio will now generate a prebuilt application based off of the Chicago crime data.
Since we only care about records that didn’t result in an arrest, we are going to limit the data being shown in Studio. Let’s eliminate all reports that did result in an arrest.
1. Apply a Refinement Filter for “Arrest:false” in the Available Refinement panel on the left, or select the “false” bar from the bar chart, now only records that resulted in an arrest are displayed.
2. There are many minor offences for which we would not expect an arrest (Trespass, Liquor Law violation etc.) so let’s eliminate some of them.
a. Select the drop-down in the Available Refinements for Primary Type, apply a negative refinement filter by selecting the circle-slash icon on some of the entries to exclude them from the data set.
Now we have limited our record set to crimes where we would expect an arrest but no arrest was made. Let’s modify our chart component to visualize reports, not resulting in an arrest, that are the most common.
1. Our current chart was auto-configured by the application; however it does not include “Primary Type” as a category axis option. We can quickly change this.
a. Select the gear in the upper right of the chart component
b. In the tab bar, select “Chart Configuration” then drag the “Primary Type” property into the dimensions list (you may need to remove an existing dimension to make room in the list, simply select the small “x” next to a dimension to remove it).
c. Select Save and Exit.
2. Select the category axis drop down on the chart component, select Primary Type
a. To order the chart, select the “Sort” drop-down and choose “Primary Type by Record Type”
Now our chart displays Theft, Battery, and Criminal Damage as the top three types of report that did not result in arrest, let’s plot them on a map to see if we can glean any additional information.
1. Select “Add Component” in the upper right and drag a map component above our chart.
a. The pre-populated map displays two layers, both Crime Incidents and Social Incidents, we only care about Crime Incidents so uncheck the “Social” check box.
b. Use the scroll wheel of your mouse to zoom in on Chicago (or use the zoom bar on the left)
2. The pins show all of the reports in the Chicago area where no arrest was made, they seem fairly evenly distributed; let’s limit our search to just thefts to see if anything jumps out.
a. Add a refinement filter either by selecting the Thefts bar in the chart, or by selecting “Thefts” under the “Primary Type” group in the Available Refinements sidebar.
3. Now we see a group of thefts on the south side of Chicago, to isolate these data points, select the circle icon and drag a circle around them, this adds a refinement filter for these specific records.
Let’s see if a specific beat in this part of the city has a prevalence of thefts that do not result in arrests.
1. In the chart component, select “Beat” under the category access drop down, then “Beat by Record Count” under the sort drop down.
2. One beat has 12 recorded incidents where no arrest was made. Let’s select the bar on the chart to take a closer look at these records.
3. Below in the Results Table we can view these records in more detail. Select the drop down and choose “Incident Dimensions” to specify the column set to view. Now select the “Location Description” heading to sort the column by location.
Of the 12 records available we see that most of them occur in residential locations. Now this could be because this is a primarily residential area, or criminals could be intentionally targeting residences in this area and not getting arrested. While more analysis of this data is needed to come to a firm conclusion, the Oracle Endeca application allowed us to quickly examine this criminal data and start down the path of taking action to address crime in the city of Chicago.