[英]How to input CSV file with tinestamp etc. into mahout to achieve similarity function etc.?
Currently, I am trying to input my data to try machine learning purpose, the data is like following with three columns (first one is time, the second one is code and the third one is number): 当前,我正尝试输入我的数据以尝试机器学习目的,数据如下所示,分为三列(第一列是时间,第二列是代码,第三列是数字):
2016-06-05 00:00:00 fd04:bd3:80e8:2:215:8d00:35:ca4b 0
2016-06-05 00:00:00 fd04:bd3:80e8:2:215:8d00:35:f2be 0.12549
2016-06-05 00:00:00 fd04:bd3:80e8:2:215:8d00:35:c8a1 0.14091
2016-06-05 00:00:01 fd04:bd3:80e8:2:215:8d00:35:ca4b 0
2016-06-05 00:00:01 fd04:bd3:80e8:2:215:8d00:35:f2be 0.25098
2016-06-05 00:00:01 fd04:bd3:80e8:2:215:8d00:35:c8a1 0
2016-06-05 00:00:02 fd04:bd3:80e8:2:215:8d00:35:ca4b 0
2016-06-05 00:00:02 fd04:bd3:80e8:2:215:8d00:35:f2be 0.25098
The following is the code to import the data into mahout: 以下是将数据导入mahout的代码:
import java.util.List;
import java.io.File;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
/**
*/
public class RecommenderIntro {
public static void main(String[] args) throws Exception {
// TODO code application logic here
DataModel model = new FileDataModel (new File("/home/leo/csv_dump11.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity (model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood (2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender (model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
How can I achieve the classification function etc.? 如何实现分类功能等? Please let me know. 请告诉我。 Thank you very much! 非常感谢你!
Your input file is in the wrong format for FileDataModel
. 您的输入文件的FileDataModel
格式错误。 If you look at the source code you'll see its expecting: 如果您查看源代码 ,将会看到它的期望值:
userID,itemID,timestamp
Which is in-line with the java.lang.NumberFormatException
error your seeing. 这与您看到的java.lang.NumberFormatException
错误一致。 Its expecting a userID
as a long
and you have a formatted date. 它期望一个userID
long
并且您具有格式化的日期。
Also note the timestamp
should be a long
. 另请注意, timestamp
应long
。 The documentation in the source indicates you can provide your own function to parse the date if you override readTimestampFromString(String)
if you don't want to convert all your dates to milliseconds. 源文件中的文档指示,如果您不想将所有日期都转换为毫秒,则可以覆盖重写readTimestampFromString(String)
来提供自己的函数来解析日期。
So you'll either need to reformat your data to work with this class or extend it and override the relevant methods need to parse it correctly (if possible). 因此,您将需要重新格式化数据以使用此类,或者扩展它并覆盖需要正确解析(如果可能)的相关方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.