读取 scala 中的文件并获取键值对作为 Map[String, List[String]]

Question

我正在读取文件并将记录作为 Spark-scala 中的 Map[String, List[String]] 获取。 我想以纯 scala 形式实现类似的事情，没有任何火花参考（不读取 rdd）。 我应该改变什么以使其以纯 scala 方式工作

rdd
      .filter(x => (x != null) && (x.length > 0))
      .zipWithIndex()
      .map {
        case (line, index) =>
          val array = line.split("~").map(_.trim)
          (array(0), array(1), index)
      }
      .groupBy(_._1)
      .mapValues(x => x.toList.sortBy(_._3).map(_._2))
      .collect
      .toMap

Answer 1

大多数情况下，除了 rdd 中的 groupBy 部分外，它将保持不变。 Scala List 还有map，filter，reduce等方法。 因此，它们几乎可以以类似的方式使用。

val lines = Source.fromFile('filename.txt').getLines.toList

一旦文件被读取并存储在 List 中，这些方法就可以应用于它。

对于 groupBy 部分，一种简单的方法是对键上的元组进行排序。 这将有效地将具有相同键的元组聚集在一起。

val grouped = scala.util.Sorting.stablesort(arr, (e1: String, e2: String, e3: String) 
               => e1._1 < e2._2)

肯定会有更好的解决方案，但这将有效地完成相同的任务。

Answer 2

我想出了以下方法

Source.fromInputStream(
getClass.getResourceAsStream(filePath)).getLines.filter(
    lines =>(lines != null) && (lines.length > 0)).map(_.split("~")).toList.groupBy(_(0)).map{ case (key, values) => (key, values.map(_(1))) }

读取 scala 中的文件并获取键值对作为 Map[String, List[String]]

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-05-02 20:31:18

解决方案2
0 2020-05-04 07:54:08

读取 scala 中的文件并获取键值对作为 Map[String, List[String]]

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-05-02 20:31:18

解决方案2 0 2020-05-04 07:54:08

解决方案1
1 已采纳 2020-05-02 20:31:18

解决方案2
0 2020-05-04 07:54:08