For their utility in retail and business systems, Recommender Systems are one of the most popular applications of predictive analytics. A Recommender System is a process that leverages data from the behaviors of individuals and attempts to make personalized predictions of a targeted individual’s future behaviors. Well-known and now obvious applications of a Recommender System include predicting a customer’s preferences for products they may buy so that they can be recommended for purchase. Another well-known and studied application is the prediction of movies that are of interest to an individual based upon their own movie viewing habits and the viewing habits of other individuals that may share some common attitude or trait that indicates some shared interest in specific movies.

Though there are several
methods for implementing a Recommender System, Collaborative Filtering has
become the de facto standard. Collaborative Filtering is an approach to
Recommender Systems that is based on a machine learning technique called *Matrix *Factorization and was made famous
within the data science community through its application in the one-million-dollar
winning solution for the Netflix Challenge. In case you missed it, the goal of
the Netflix Challenge was to develop a better solution to the streaming movie
service’s movie recommender system algorithm that makes personalized movie
viewing recommendations to its users. More specifically, a better solution to
Netflix’s movie recommender system would be one that can more accurately
predict the rating of how much a specific user liked a specific movie. Even a
marginal improvement in Netflix’s current recommendation algorithm could result
in millions of more hours of their users’ eyes being glued to their televisions,
computers, and mobile devices with Netflix’s for-fee content being steadily streamed
as users watch the recommended movies. The success of these news and
entertainment content recommender systems has given rise to the concept of
“Binge Watching” where a user watches one movie recommendation to the next in a
never-ending stream of content.

Though Netflix’s movie recommendation and Amazon’s product recommendation systems are the applications that most data scientists associate with Collaborative Filtering and Matrix Factorization, there are several alternative applications of Matrix Factorization that are rarely considered among the community and are missed opportunities for the application of the technique. Other applications include:

- Imputation of missing/incomplete data
- Imaging: Segmentation and Noise Removal
- Text Mining/Topic Modeling

**Matrix Factorization**

Matrix Factorization is a predictive
technique that makes use of the inherent correlations in data between the rows
and columns of a matrix to discover the *latent
or hidden factors* in data that can be leveraged for prediction. This
prediction is accomplished by decomposing the original matrix into two matrices
with the smallest dimension (i.e., rank) indicating the number of discovered
latent factors.

The following figure (Agarrwal
2016) is an example of 7 individual users’ movie preferences given in matrix *R* as values 1, -1, and 0 representing a
user’s Like, Dislike, and Neutral preference for a movie. The matrices *U* and *V* (transposed in the figure to support the matrix multiplication)
represent the latent factor matrices. The discovered latent factors in this
example are the movie genres History and Romance. Note that the matrix
factorization process does not automatically provide labels for the discovered
latent factors, however, History and Romance were provided as examples for the
reader to follow how the data clearly supports these latent factors.

Notice Users 1, 2, and 3’s preferences for Nero, Julius Caesar, and Cleopatra and neutrality toward Sleepless in Seattle, Pretty Woman, and Casablanca indicate their preference for History. User 4 appears to like both History and Romance movies while Users 5, 6, and 7 prefer Romances. However, User 6 evidently finds the movie Cleopatra to be somewhat romantic.

The utility of this example
is limited to being illustrative of the components to be considered in Matrix
Factorization. The value of Matrix Factorization is realized when there are
missing ratings in the matrix that we wish to predict using the discovered
latent factors. Consider that if we can complete the matrices *U* and *V*, then we can multiply them to determine the (approximate) values
of the original matrix *R*.

Even better, the discovery of these latent factors is a fully automated process!

How are the latent factor matrices discovered?

The math of Matrix Factorization is straight-forward and based on the following equation:

where is a predicted approximation of the original matrix *R*, *U*
and *V* are the latent factor matrices
that, when multiplied, approximate the original matrix *R*. The task at hand is to search for appropriate values for the
matrices *U* and *V*.

There are several
approaches to achieving such an approximation including minimizing an objective
function using *Gradient Descent* search algorithm. The chosen objective function is one that
simply represents the distance between the original matrix *R* and the product of the matrices *U* and *V*. This difference can
be thought of as the overall prediction error, *e,* and is represented by the equation below.

As the search for
values proceeds through the iterations of Gradient Descent, the values of
matrices *U* and *V* are adjusted toward the goal of minimizing the error until a
user-specified stopping condition is met (e.g., iteration count,
error-tolerance). At this point, we accept the matrices *U* and *V* as our latent
factor matrices that can now be used for prediction of missing values such as a
predicted movie rating for a user of a streaming movie service or a product
rating for a customer of an e-commerce site. High rating predictions are considered
strong indicators of propensity to select or buy.

**Alternative Applications**

As previously mentioned, Matrix Factorization was made popular as of late by its successful use in Recommender Systems. However, there are other useful applications to be discussed.

Next in this series on Matrix Factorization, we will go in depth into the deployment of Matrix Factorization as a Data Imputation and Augmentation micro-service and how NuWave applied it to the development of the Virtual Anticipation Network (VANE) for the Army G2.