For their utility in retail and business systems, Recommender Systems are one of the most popular applications of predictive analytics. A Recommender System is a process that leverages data from the behaviors of individuals and attempts to make personalized predictions of a targeted individual’s future behaviors. Well-known and now obvious applications of a Recommender System include predicting a customer’s preferences for products they may buy so that they can be recommended for purchase. Another well-known and studied application is the prediction of movies that are of interest to an individual based upon their own movie viewing habits and the viewing habits of other individuals that may share some common attitude or trait that indicates some shared interest in specific movies.
Though there are several methods for implementing a Recommender System, Collaborative Filtering has become the de facto standard. Collaborative Filtering is an approach to Recommender Systems that is based on a machine learning technique called Matrix Factorization and was made famous within the data science community through its application in the one-million-dollar winning solution for the Netflix Challenge. In case you missed it, the goal of the Netflix Challenge was to develop a better solution to the streaming movie service’s movie recommender system algorithm that makes personalized movie viewing recommendations to its users. More specifically, a better solution to Netflix’s movie recommender system would be one that can more accurately predict the rating of how much a specific user liked a specific movie. Even a marginal improvement in Netflix’s current recommendation algorithm could result in millions of more hours of their users’ eyes being glued to their televisions, computers, and mobile devices with Netflix’s for-fee content being steadily streamed as users watch the recommended movies. The success of these news and entertainment content recommender systems has given rise to the concept of “Binge Watching” where a user watches one movie recommendation to the next in a never-ending stream of content.
Though Netflix’s movie recommendation and Amazon’s product recommendation systems are the applications that most data scientists associate with Collaborative Filtering and Matrix Factorization, there are several alternative applications of Matrix Factorization that are rarely considered among the community and are missed opportunities for the application of the technique. Other applications include:
Matrix Factorization is a predictive technique that makes use of the inherent correlations in data between the rows and columns of a matrix to discover the latent or hidden factors in data that can be leveraged for prediction. This prediction is accomplished by decomposing the original matrix into two matrices with the smallest dimension (i.e., rank) indicating the number of discovered latent factors.
The following figure (Agarrwal 2016) is an example of 7 individual users’ movie preferences given in matrix R as values 1, -1, and 0 representing a user’s Like, Dislike, and Neutral preference for a movie. The matrices U and V (transposed in the figure to support the matrix multiplication) represent the latent factor matrices. The discovered latent factors in this example are the movie genres History and Romance. Note that the matrix factorization process does not automatically provide labels for the discovered latent factors, however, History and Romance were provided as examples for the reader to follow how the data clearly supports these latent factors.
Notice Users 1, 2, and 3’s preferences for Nero, Julius Caesar, and Cleopatra and neutrality toward Sleepless in Seattle, Pretty Woman, and Casablanca indicate their preference for History. User 4 appears to like both History and Romance movies while Users 5, 6, and 7 prefer Romances. However, User 6 evidently finds the movie Cleopatra to be somewhat romantic.
The utility of this example is limited to being illustrative of the components to be considered in Matrix Factorization. The value of Matrix Factorization is realized when there are missing ratings in the matrix that we wish to predict using the discovered latent factors. Consider that if we can complete the matrices U and V, then we can multiply them to determine the (approximate) values of the original matrix R.
Even better, the discovery of these latent factors is a fully automated process!
How are the latent factor matrices discovered?
where is a predicted approximation of the original matrix R, U and V are the latent factor matrices that, when multiplied, approximate the original matrix R. The task at hand is to search for appropriate values for the matrices U and V.
There are several approaches to achieving such an approximation including minimizing an objective function using Gradient Descent search algorithm. The chosen objective function is one that simply represents the distance between the original matrix R and the product of the matrices U and V. This difference can be thought of as the overall prediction error, e, and is represented by the equation below.
As the search for values proceeds through the iterations of Gradient Descent, the values of matrices U and V are adjusted toward the goal of minimizing the error until a user-specified stopping condition is met (e.g., iteration count, error-tolerance). At this point, we accept the matrices U and V as our latent factor matrices that can now be used for prediction of missing values such as a predicted movie rating for a user of a streaming movie service or a product rating for a customer of an e-commerce site. High rating predictions are considered strong indicators of propensity to select or buy.
As previously mentioned, Matrix Factorization was made popular as of late by its successful use in Recommender Systems. However, there are other useful applications to be discussed.
Next in this series on Matrix Factorization, we will go in depth into the deployment of Matrix Factorization as a Data Imputation and Augmentation micro-service and how NuWave applied it to the development of the Virtual Anticipation Network (VANE) for the Army G2.