简体   繁体   English

使用Apache Mahout连接到MongoDB

[英]Connect To MongoDB using Apache Mahout

I'm trying to generate recommendations using Apache Mahout while using MongoDB to create the datamodel as per the MongoDBDataModel. 我试图在使用MongoDB根据MongoDBDataModel创建数据模型时使用Apache Mahout生成建议。 My code is as follows : 我的代码如下:

import java.net.UnknownHostException;
import java.util.List;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
 import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
 import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
 import org.apache.mahout.cf.taste.recommender.RecommendedItem;
 import org.apache.mahout.cf.taste.recommender.UserBasedRecommender;
 import org.apache.mahout.cf.taste.similarity.ItemSimilarity;
 import org.apache.mahout.cf.taste.similarity.UserSimilarity;
 import com.mongodb.MongoException;


public class usingMongo {
public static void main(String[] args) throws UnknownHostException, Mong oException
        ,TasteException {
    final long startTime = System.nanoTime();

    MongoDBDataModel model = new MongoDBDataModel("AdamsLaptop", 27017,
            "test", "ratings100k", false, false, null);
    System.out.println("connected to mongo ");

    UserSimilarity UserSim = new PearsonCorrelationSimilarity(model);

    UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.5, UserSim, model);

    UserBasedRecommender UserRecommender = new GenericUserBasedRecommender(model, neighborhood, UserSim);
    List<RecommendedItem>UserRecommendations = UserRecommender.recommend(1, 3);
    for (RecommendedItem recommendation : UserRecommendations) {
          System.out.println("You may like movie " + recommendation.getItemID() + " as a user similar to you also rated it " + recommendation.getValue() + " USER");
    }

    ItemSimilarity ItemSim = new PearsonCorrelationSimilarity(model);//LogLikelihoodSimilarity(model);

    GenericItemBasedRecommender ItemRecommender = new GenericItemBasedRecommender(model, ItemSim);
    List<RecommendedItem>ItemRecommendations = ItemRecommender.recommend(1, 3);
    for (RecommendedItem recommendation : ItemRecommendations) {
          System.out.println("You may like movie " + recommendation.getItemID() + " as a user similar to you also rated it " + recommendation.getValue() + " ITEM");
        }


    final long duration = System.nanoTime() - startTime;
    System.out.println(duration);
}
}

I cant see where I've gone wrong but with numerous changes and lots of trial and error the error message remains the same : 我看不到哪里出了问题,但是经过多次更改和大量的反复试验,错误消息仍然相同:

 Exception in thread "main" java.lang.NullPointerException
at org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel.getID(MongoDBDataModel.java:743)
at org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel.buildModel(MongoDBDataModel.java:570)
at org.apache.mahout.cf.taste.impl.model.mongodb.MongoDBDataModel.<init>(MongoDBDataModel.java:245)
at recommender.usingMongo.main(usingMongo.java:24)

Any suggestions? 有什么建议么? Here's an example of my data within MongoDB : 这是我在MongoDB中的数据的示例:

{ "_id" : ObjectId("56ddf61f5960960c333f3dcb"),"userId" : 1, "movieId" : 292, "rating" : 4, "timestamp" : 847116936 }

I succesfully integrated MongoDB data to mahout. 我成功地将MongoDB数据集成到了mahout中。

The structure of the data in mongoDB depends on the kind of Similarity algorithm you use.for eg, mongoDB中的数据结构取决于您使用的相似性算法。例如,

UserSimilarity 用户相似度

MongoDBDataModel datamodel = new MongoDBDataModel("127.0.0.1", 27017, "testing", "ratings", true, true, null); MongoDBDataModel datamodel = new MongoDBDataModel(“ 127.0.0.1”,27017,“测试”,“评级”,true,true,null); where the user_id, item_id are integer values, preference are float values and created_at as timestamp 其中user_id,item_id是整数值,preference是浮点值,并且created_at作为时间戳

SVDRecommender SVD推荐

the user_id, item_id are MongoDB Objects and preference are float values and created_at as timestamp user_id,item_id是MongoDB对象,preference是浮点值,created_at作为时间戳

The obvious troubleshooting you can do is whether the MongoDB server is running or not. 您可以做的明显的故障排除是MongoDB服务器是否正在运行。 As per the exception it's running. 作为例外,它正在运行。 I think the problem lies in your structure of data.. 我认为问题出在您的数据结构中。

Use user_id instead of userId, item_id instead of itemId, preference instead of rating. 使用user_id代替userId,使用item_id代替itemId,使用偏好代替等级。 I don't know if this will make any difference. 我不知道这是否会有所作为。 I used one of the tutorial online, but can't find it at the moment. 我在线使用了其中一本教程,但目前找不到。

It's working but too slow when I have more than 10000 users with 1000 items. 当我有超过10000个用户拥有1000个项目时,它可以正常工作,但是速度太慢。

I think that the problem is that mahout assumes some default values when it comes to some fields that need to reside in your mongoDB the item ID, User ID and preferences that are user_id, item_id and preference so The solution might lie on using another MongoDBDataModel constructor that will give you the possibility to pass as parameters the names of those fields in your mongoDB instance or redesign your Collections Schema. 我认为问题在于mahout在需要驻留在mongoDB中的某些字段时会采用一些默认值,即项目ID,用户ID以及user_id,item_id和preferences的首选项,因此解决方案可能取决于使用另一个MongoDBDataModel构造函数这样您就可以在mongoDB实例中传递这些字段的名称作为参数,或者重新设计Collections Schema。

I hope that makes sense. 我希望这是有道理的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM