简体   繁体   中英

Mahout: Using two DataModels in one Recommender

I am trying to create a simple recommendation engine with two sets of boolean preference data. I want to use one data set to calculate UserSimilarity and UserNeighborhoods, and then use those neighborhoods to make recommendations from a second set of boolean preference data.

I seem to have this working, but the problem is that when I go to calculate recommendations, if a user has neighbors based on the first data set, but is not present in the second data set (though their neighbors are) it produces no recommendations.

Here's RecommendationBuilder code:

  recommenderBuilder = new RecommenderBuilder() {
      public Recommender buildRecommender(DataModel recommendationModel) throws TasteException {
          UserSimilarity similarity = new LogLikelihoodSimilarity(trainingModel);
          UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, 0.7, similarity, recommendationModel);

          return new GenericBooleanPrefUserBasedRecommender(recommendationModel, neighborhood, similarity);
      }
  };

And here's a sample of the trainingModel file

1,111
2,222

2,111
2,222

3,111
3,222

And the recommendationModel file

1,91
1,92

2,91

Running this recommends 92 for user 2, but throws a NoSuchUserException when it gets to user 3.

Sol... Is there any way to produce recommendations from one data set based on similarities calculated on another data set, without needing to have all users present in the second data set?

Here's the complete code I'm working with right now:

private DataModel trainingModel;
private DataModel recommendationModel;
private RecommenderEvaluator evaluator;
private RecommenderIRStatsEvaluator evaluator2;
private RecommenderBuilder recommenderBuilder;
private DataModelBuilder modelBuilder;

@Override
public void afterPropertiesSet() throws IOException, TasteException {

    trainingModel = new GenericBooleanPrefDataModel(
        GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("/music.csv")))
    );

    recommendationModel = new GenericBooleanPrefDataModel(
            GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("/movies.csv")))
    );

    evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
    evaluator2 = new GenericRecommenderIRStatsEvaluator();


    recommenderBuilder = new RecommenderBuilder() {
        public Recommender buildRecommender(DataModel model) throws TasteException {
            UserSimilarity similarity = new LogLikelihoodSimilarity(trainingModel);
            UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, 0.7, similarity, model);

            return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
        }
    };

    modelBuilder = new DataModelBuilder() {
        public DataModel buildDataModel( FastByIDMap<PreferenceArray> trainingData ) {
            return new GenericBooleanPrefDataModel( GenericBooleanPrefDataModel.toDataMap(trainingData) );
        }        
    };

}

And then running this method

    @Override
    public void testData() throws TasteException {

        double score = evaluator.evaluate(recommenderBuilder, modelBuilder, trainingModel, 0.9, 1.0);
        System.out.println("calculated score: " + score);

        try {
            IRStatistics stats = evaluator2.evaluate(
                    recommenderBuilder, modelBuilder, trainingModel, null, 2,
                    0.0,
                    1.0
            );
            System.out.println("recall: " + stats.getRecall());
            System.out.println("precision: " + stats.getPrecision());
        } catch (Throwable t) {
            System.out.println("throwing " + t);
        }

        List<RecommendedItem> recommendations = recommenderBuilder.buildRecommender(recommendationModel).recommend(1,2);
        System.out.println("user 1");
        for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation);}

        recommendations = recommenderBuilder.buildRecommender(recommendationModel).recommend(2,2);
        System.out.println("user 2");
        for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation);}

        try {
            recommendations = recommenderBuilder.buildRecommender(recommendationModel).recommend(3,2);
            System.out.println("user 3");
            for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation);}
        } catch (Throwable t) {
            System.out.println("throwing " + t);
        }
}

Produces this output:

calculated score: 0.7033357620239258 recall: 1.0 precision: 1.0 user 1 user 2 RecommendedItem[item:9222, value:0.8516679] throwing org.apache.mahout.cf.taste.common.NoSuchUserException: 3

You can do what you are describing, and roughly how you are describing it. The data set that powers the user similarity metric could indeed be different from the data set over which recommendations are made. The user similarity metric could in fact be based on anything you like.

However it does need to be able to produce a user-user similarity for any pair in the data set used to make recommendations. I suggest you simply special-case this in your UserSimilarity implementation to return 0 or something when one user is unknown.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM