简体   繁体   中英

Mahout Item Similarity Output Empty

I'm using Mahout's ItemSimilarityJob to compute similarity of items with an input .csv file that looks like this:

user_id(numbers only), song_id(numbers only), listens(numbers only)

When I run the ItemSimilarityJob with these parameters

$MAHOUT_HOME/bin/mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input inputcsv/ --output outputcsv --similarityClassname SIMILARITY_PEARSON_CORRELATION --tempDir tempcsv --booleanData true

I get a blank part-r-00000 file inside music/csvoutput directory. There are many files inside music/csvtemp however. What could be the reason?

Probably, your input is where you think it is, or you're not indicating where you think you are. Usually the --input is a fully qualified path. Check and try that. Or your data is so small that no similarities can be computed.

hope my experience and answer helps others, really could have saved me some precious time. You would also want the check the value of the --threshold parameter. Setting it too high (even 0.01) causes Mahout to filter data and eventually generate empty files. In my case it was my random generated data that caused this.

mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob -i intro.csv --output outputcsv --similarityClassname SIMILARITY_PEARSON_CORRELATION -m 3 --tempDir tempcsv --threshold 0.7 --booleanData

这将使用它

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM