如何针对每个RDD Spark流

Question

I have one CSV file queries.txt and I am reading the file like this: 我有一个CSV文件query.txt，正在读取这样的文件：

JavaRDD<String> distFile = sc.textFile("queries.txt");

Schema of queries.txt file is: Uniq_Id,,,...some numeric values in csv... querys.txt文件的模式为：Uniq_Id，...，csv中的一些数字值...

I need for each line - create a HashMap, whose key is first column of the queries.txt file(Uniq_Id) and value is other columns in file to HashMap. 我需要每一行-创建一个HashMap，其键是querys.txt文件（Uniq_Id）的第一列，值是HashMap的文件中的其他列。

example. 例。 (This is not real and not a working example, I just want to convey the essence) （这不是真实的，也不是一个可行的示例，我只想传达本质）

HashMap totalMap = new HashMap<Integer, NumericValues>();

for(int i=0;i<distFile.size();i++)
{
   String line = distFile[i].getColumns();
   for(int y=0;y<line.size();y++)
   {
      totalMap.put(line.getFirstColumn,line.getRemainingColumns);
   }
}

Here NumericValues is my custom class which will have the variables mapping to the columns in the file. 这里NumericValues是我的自定义类，它将具有映射到文件中列的变量。

Any other suggestions will be helpful. 任何其他建议将有所帮助。

Answer 1

I guess this is what you are looking for, but this example doesn't parses the CSV line itself. 我想这就是您要寻找的东西，但是此示例未解析CSV行本身。

  JavaRDD<String> distFile = sc.textFile("queries.txt");
  HashMap totalMap = new HashMap<Integer, NumericValues>();
  distFile.foreach(new VoidFunction<String>(){ 
          public void call(String line) {
              totalMap.put(yourCSVParser(line)); //this is dummy function call 
    }});

如何针对每个RDD Spark流

问题描述

1 个解决方案

解决方案1
4 2015-08-25 08:55:44

如何针对每个RDD Spark流

问题描述

1 个解决方案

解决方案1 4 2015-08-25 08:55:44

解决方案1
4 2015-08-25 08:55:44