简体   繁体   English

通过使用Apache-mahout根据用户的偏好为其他用户推荐用户

[英]Recommending users for other users based on their preferences by using Apache-mahout

This is my first question on stackoverflow.com, so if I make any mistake sorry about that. 这是我在stackoverflow.com上的第一个问题,因此,如果对此有任何错误,请对不起。

Now, I'm trying to create a recommendation engine in java by using apache-mahout. 现在,我正在尝试使用apache-mahout在Java中创建推荐引擎。 I have an input file like shown below (It will be much larger of course) : 我有一个如下所示的输入文件(当然,它会更大):

 userID1 ItemID1  Rating1
 userID1 ItemID2  Rating2
 userID2 ItemID1  Rating3
 userID2 ItemID3  Rating4
 userID3 ItemID4  Rating5
 userID4 ItemID2  Rating6

What I want to do is for each user, I want to recommend some of other users based on their ratings on Items. 我要做的是针对每个用户,我想根据其他用户对商品的评分推荐其他一些用户。 Lets say, at the end of my program the output will be 可以说,在我的程序结束时,输出将是

userID1  similar to UserID2  with score of 0.8 (This score could be a value between 0 and 1 or a percentage  only requirement is being reasonable)
userID1  similar to userID3  with score of 0.7
userID2  similar to UserID1  with score of 0.8
userID2  similar to userID4  with score of 0.5
userID3  similar to userID1  with score of 0.7
userID4  similar to userID2  with score of 0.5

And so on. 等等。 For this purpose, I've written the following code. 为此,我编写了以下代码。

public void RecommenderFunction()
{
        DataModel model = new FileDataModel(new File("data/dataset.csv")); 
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0, similarity, model);
        UserBasedRecommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);

        for(LongPrimitiveIterator users=model.getUserIDs();users.hasNext();)
        {
            long userId=users.nextLong();
            long[] recommendedUserIDs=recommender.mostSimilarUserIDs(userId, 100); // I want to find all similarUserIDs not a subset of it.Thats why I put 100 as a second argument.

            for(long recID:recommendedUserIDs)
            {
                System.out.println("user:"+userId+" similar with:"+recID);
            }

        }


}

This is my dataset.csv file 这是我的dataset.csv文件

1,10,1.0
1,11,2.0
1,12,5.0
1,13,5.0
1,14,5.0
1,15,4.0
1,16,5.0
1,17,1.0
1,18,5.0
2,10,1.0
2,11,2.0
2,15,5.0
2,16,4.5
2,17,1.0
2,18,5.0
3,11,2.5
3,12,4.5
3,13,4.0
3,14,3.0
3,15,3.5
3,16,4.5
3,17,4.0
3,18,5.0
4,10,5.0
4,11,5.0
4,12,5.0
4,13,0.0
4,14,2.0
4,15,3.0
4,16,1.0
4,17,4.0
4,18,1.0

And this is the result of my program for this dataset: 这是我的程序针对该数据集的结果:

user:1 similar with:2
user:1 similar with:3
user:1 similar with:4
user:2 similar with:1
user:2 similar with:3
user:2 similar with:4
user:3 similar with:2
user:3 similar with:1
user:3 similar with:4
user:4 similar with:3
user:4 similar with:1
user:4 similar with:2

I know, since I put 100 as a second argument for the function above, recommender returns all couples of users as similar to each other. 我知道,因为我在上面的函数中将100作为第二个参数,所以Recommendationer将所有类似的用户返回。 My question begins here. 我的问题从这里开始。 My Program is able to give me which users are similar to each other. 我的程序可以给我哪些用户彼此相似。 However I could not find a way to get similarity score of them. 但是,我找不到找到它们相似度的方法。 How could I do that? 我该怎么办?

EDIT 编辑

I think, pearson coefficient similarity results might be used to validate recommendations. 我认为,皮尔逊系数相似性结果可用于验证建议。 Is my logic wrong? 我的逻辑错了吗? I mean, I modified the code above with the following way : 我的意思是,我通过以下方式修改了上面的代码:

 public void RecommenderFunction()
    {
        // same as above.
            for(LongPrimitiveIterator users=model.getUserIDs();users.hasNext();)
            {
                // same as above.

                for(long recID:recommendedUserIDs)
                {
                    // confidence score of recommendation is the pearson correlation score of two users. Am I wrong?
                    System.out.println("user:"+userId+" similar with:"+recID+" score of: "+similarity.userSimilarity(userId, recID));
                }

            }


    }

This is a good start. 这是一个好的开始。 Remember that user-user similarity value are used to create item recommendations, so you can't again use similarity scores to validate recommendation quality. 请记住,用户-用户相似度值用于创建商品推荐,因此您不能再次使用相似度分数来验证推荐质量。 Now that you have user-user similarity scores, use Mahout to generate item recommendations for all of your users. 现在您已经获得了用户与用户的相似度得分,使用Mahout为所有用户生成商品推荐。 When you have that working, you can test the quality of your recommendations by hiding some of the data from your recommender, seeing what it predicts for those hidden ratings, and then measuring how close the predictions are. 完成这项工作后,您可以通过以下方法测试建议的质量:隐藏推荐器中的一些数据,查看其对这些隐藏评级的预测结果,然后测量预测的接近程度。 This is one form of recommender evaluation (among many) and it's called predictive accuracy. 这是推荐程序评估的一种形式(其中很多),称为预测准确性。 A common metric is RMSE, or root mean squared error. 常见的度量标准是RMSE,即均方根误差。 With a metric like that, you'll be able to see how well your recommender performs. 使用类似的指标,您将能够看到推荐者的效果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM