简体   繁体   中英

Recommending users for other users based on their preferences by using Apache-mahout

This is my first question on stackoverflow.com, so if I make any mistake sorry about that.

Now, I'm trying to create a recommendation engine in java by using apache-mahout. I have an input file like shown below (It will be much larger of course) :

 userID1 ItemID1  Rating1
 userID1 ItemID2  Rating2
 userID2 ItemID1  Rating3
 userID2 ItemID3  Rating4
 userID3 ItemID4  Rating5
 userID4 ItemID2  Rating6

What I want to do is for each user, I want to recommend some of other users based on their ratings on Items. Lets say, at the end of my program the output will be

userID1  similar to UserID2  with score of 0.8 (This score could be a value between 0 and 1 or a percentage  only requirement is being reasonable)
userID1  similar to userID3  with score of 0.7
userID2  similar to UserID1  with score of 0.8
userID2  similar to userID4  with score of 0.5
userID3  similar to userID1  with score of 0.7
userID4  similar to userID2  with score of 0.5

And so on. For this purpose, I've written the following code.

public void RecommenderFunction()
{
        DataModel model = new FileDataModel(new File("data/dataset.csv")); 
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0, similarity, model);
        UserBasedRecommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

        for(LongPrimitiveIterator users=model.getUserIDs();users.hasNext();)
        {
            long userId=users.nextLong();
            long[] recommendedUserIDs=recommender.mostSimilarUserIDs(userId, 100); // I want to find all similarUserIDs not a subset of it.Thats why I put 100 as a second argument.

            for(long recID:recommendedUserIDs)
            {
                System.out.println("user:"+userId+" similar with:"+recID);
            }

        }


}

This is my dataset.csv file

1,10,1.0
1,11,2.0
1,12,5.0
1,13,5.0
1,14,5.0
1,15,4.0
1,16,5.0
1,17,1.0
1,18,5.0
2,10,1.0
2,11,2.0
2,15,5.0
2,16,4.5
2,17,1.0
2,18,5.0
3,11,2.5
3,12,4.5
3,13,4.0
3,14,3.0
3,15,3.5
3,16,4.5
3,17,4.0
3,18,5.0
4,10,5.0
4,11,5.0
4,12,5.0
4,13,0.0
4,14,2.0
4,15,3.0
4,16,1.0
4,17,4.0
4,18,1.0

And this is the result of my program for this dataset:

user:1 similar with:2
user:1 similar with:3
user:1 similar with:4
user:2 similar with:1
user:2 similar with:3
user:2 similar with:4
user:3 similar with:2
user:3 similar with:1
user:3 similar with:4
user:4 similar with:3
user:4 similar with:1
user:4 similar with:2

I know, since I put 100 as a second argument for the function above, recommender returns all couples of users as similar to each other. My question begins here. My Program is able to give me which users are similar to each other. However I could not find a way to get similarity score of them. How could I do that?

EDIT

I think, pearson coefficient similarity results might be used to validate recommendations. Is my logic wrong? I mean, I modified the code above with the following way :

 public void RecommenderFunction()
    {
        // same as above.
            for(LongPrimitiveIterator users=model.getUserIDs();users.hasNext();)
            {
                // same as above.

                for(long recID:recommendedUserIDs)
                {
                    // confidence score of recommendation is the pearson correlation score of two users. Am I wrong?
                    System.out.println("user:"+userId+" similar with:"+recID+" score of: "+similarity.userSimilarity(userId, recID));
                }

            }


    }

This is a good start. Remember that user-user similarity value are used to create item recommendations, so you can't again use similarity scores to validate recommendation quality. Now that you have user-user similarity scores, use Mahout to generate item recommendations for all of your users. When you have that working, you can test the quality of your recommendations by hiding some of the data from your recommender, seeing what it predicts for those hidden ratings, and then measuring how close the predictions are. This is one form of recommender evaluation (among many) and it's called predictive accuracy. A common metric is RMSE, or root mean squared error. With a metric like that, you'll be able to see how well your recommender performs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM