简体   繁体   English

如何将带有tinestamp等的CSV文件输入到mahout中以实现相似功能等?

[英]How to input CSV file with tinestamp etc. into mahout to achieve similarity function etc.?

Currently, I am trying to input my data to try machine learning purpose, the data is like following with three columns (first one is time, the second one is code and the third one is number): 当前,我正尝试输入我的数据以尝试机器学习目的,数据如下所示,分为三列(第一列是时间,第二列是代码,第三列是数字):

2016-06-05 00:00:00      fd04:bd3:80e8:2:215:8d00:35:ca4b   0

2016-06-05 00:00:00      fd04:bd3:80e8:2:215:8d00:35:f2be   0.12549

2016-06-05 00:00:00      fd04:bd3:80e8:2:215:8d00:35:c8a1   0.14091

2016-06-05 00:00:01      fd04:bd3:80e8:2:215:8d00:35:ca4b   0

2016-06-05 00:00:01      fd04:bd3:80e8:2:215:8d00:35:f2be   0.25098

2016-06-05 00:00:01      fd04:bd3:80e8:2:215:8d00:35:c8a1   0

2016-06-05 00:00:02      fd04:bd3:80e8:2:215:8d00:35:ca4b   0

2016-06-05 00:00:02      fd04:bd3:80e8:2:215:8d00:35:f2be   0.25098

The following is the code to import the data into mahout: 以下是将数据导入mahout的代码:

import java.util.List;
import java.io.File;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import     org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
/**
 */
    public class RecommenderIntro {
        public static void main(String[] args) throws Exception {

            // TODO code application logic here
            DataModel model = new FileDataModel (new File("/home/leo/csv_dump11.csv"));
            UserSimilarity similarity = new PearsonCorrelationSimilarity (model);
            UserNeighborhood neighborhood = new NearestNUserNeighborhood (2, similarity, model);
            Recommender recommender = new GenericUserBasedRecommender (model, neighborhood, similarity);
            List<RecommendedItem> recommendations = recommender.recommend(1, 1);
            for (RecommendedItem recommendation : recommendations) {
                System.out.println(recommendation);
            }
        }
    }

How can I achieve the classification function etc.? 如何实现分类功能等? Please let me know. 请告诉我。 Thank you very much! 非常感谢你!

Your input file is in the wrong format for FileDataModel . 您的输入文件的FileDataModel格式错误。 If you look at the source code you'll see its expecting: 如果您查看源代码 ,将会看到它的期望值:

userID,itemID,timestamp

Which is in-line with the java.lang.NumberFormatException error your seeing. 这与您看到的java.lang.NumberFormatException错误一致。 Its expecting a userID as a long and you have a formatted date. 它期望一个userID long并且您具有格式化的日期。

Also note the timestamp should be a long . 另请注意, timestamplong The documentation in the source indicates you can provide your own function to parse the date if you override readTimestampFromString(String) if you don't want to convert all your dates to milliseconds. 源文件中的文档指示,如果您不想将所有日期都转换为毫秒,则可以覆盖重写readTimestampFromString(String)来提供自己的函数来解析日期。

So you'll either need to reformat your data to work with this class or extend it and override the relevant methods need to parse it correctly (if possible). 因此,您将需要重新格式化数据以使用此类,或者扩展它并覆盖需要正确解析(如果可能)的相关方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM