简体   繁体   中英

Implementing SVD recommender in Mahout

I have a dataset of 50 Million user-preferences containing 8 million distinct users and 180K distinct products. I am currently using a boolean data model and have a basic tanimoto similarity based recommender in place. I am trying to explore different algorithms for better recommendations and started out with SVD with ALSWR factoriser. I have used the base SVD recommender provided in mahout as follows.

DataModel dataModel = new FileDataModel("/FilePath");

ALSWRFactorizer factorizer = new ALSWRFactorizer(dataModel, 50, 0.065, 15);

recommender = new SVDRecommender(dataModel, factorizer);

As per my basic understanding, i believe the factorisation takes place offline, and it creates the user features and item features. While the actual requests are served by calculating the top products for an user by taking a dot product of user vector and all the possible item vectors.

I have a couple of doubts regarding the approach :-

  1. What is the best way to choose the factorising parameters and how much time does the factorisation usually take? I tried with the above parameters and the factorisation itself ran for 30+ min.
  2. Is there a way to serve real time requests a bit faster, as taking the dot product with all possible item vectors is resulting in a high request time? Is there something as offline SVD?
  3. Looking at the size of the dataset that i have, should i be trying some other factoriser?

I want to answer all your questions together.

Given the size of your data and the real time request, you should take another approach.

  1. Do an offline item-item similarity calculation which does not need to be done that often for items with lot of ratings. They mostly don't change. You may want to recalculate for item with few ratings.
  2. Calculate the user-items rating prediction per user in real-time using the item-item similarity list. This operation is not that costly since you have a lot less items than users. It's also a constant time operation when the item size doesn't change that much.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM