简体   繁体   English

基于内容的推荐引擎,在eclipse上使用mahout

[英]Content based recommender engine using mahout on eclipse

Are there any step by step tutorials for making a content based recommender system with Mahout on eclipse/java? 在eclipse / java上使用Mahout制作基于内容的推荐器系统是否有分步教程?

Ive tried wokring with Mahout and was able to make a collaborative system but i want to try and make a content based, ive read about making a custom ItemSimilarity method and I just recently discovered RowSimilarityJob for Mahout, im relatively new to using mahout can someone help me out on how to use the function? 我已经尝试过与Mahout一起工作,并且能够构建一个协作系统,但是我想尝试基于内容,因此,我读了有关制作自定义ItemSimilarity方法的信息,而我最近才发现Mahout的RowSimilarityJob,对于使用mahout而言相对较新的人可以帮助我出来如何使用该功能?

Actually the itemSimilarity job is 1) in the old soon to be deprecated Hadoop MapReduce code and 2) finds 2 similar documents in a rather simplistic manner. 实际上,itemSimilarity的工作是:1)在即将淘汰的Hadoop MapReduce旧代码中,以及2)以相当简单的方式找到2个相似的文档。 There is a new Spark version of the job called spark itemSimilarity that does much the same but only supports LLR scores for similarity. 这项工作有一个新的Spark版本,称为spark itemSimilarity ,其功能大致相同,但仅支持LLR得分以实现相似性。

Unless you are incorporating it into a larger more complex recommender I'd suggest you just use Elasticsearch or Solr to find similar items by content. 除非您将其合并到更大,更复杂的推荐器中,否则我建议您只使用Elasticsearch或Solr来按内容查找相似的项目。 They have much more robust methods that are quite flexible. 他们拥有更加灵活的健壮方法。 At the core they both use Lucene, the pre-eminent knn engine (k-nearest neighbors) for sparse data. 它们的核心都使用Lucene,即杰出的knn引擎(k近邻)来处理稀疏数据。

KNN is the type of algo you want, given an item with several content fields which items are most similar? 给定一个具有多个内容字段的项目,KNN是您想要的算法类型,哪个项目最相似?

Elasticsearch and Solr also have servers that are performant and highly scalable. Elasticsearch和Solr还具有高性能和高度可扩展的服务器。 Plus they do not require constant training. 另外,他们不需要持续的培训。 Just add a new doc for every item and they will index incrementally so query results will eventually include the newer docs, without a training step. 只需为每个项目添加一个新文档,它们就会递增索引,因此查询结果最终将包括较新的文档,而无需进行任何培训。

But be aware that content-based recommendations are seldom nearly as good as Collaborative Filtering if you have the right data. 但是请注意,如果您拥有正确的数据,则基于内容的建议很少会比“协同过滤”好。 Arguably the best Open Source example of a Modern Multi-modal CF recommender is the Universal Recommender (based on Mahout and Apache PredictionIO) here: http://actionml.com/docs/ur 可以说,现代多模式CF推荐器的最佳开源示例是通用推荐器(基于Mahout和Apache PredictionIO),网址为: http : //actionml.com/docs/ur

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM