Scala中的函数式编程：Output文本文件中出现次数最多的单词（或单词列表）？

Question

Output 在文本文件中出现次数最多的单词（或单词列表）（不考虑大小写——即“单词”和“单词”为此目的被视为相同）。 我们只对包含字母字符 [AZ az] 的单词感兴趣，因此忽略任何数字（数字）、标点符号等。

如果有几个词出现频率最高，那么所有这些词都应该打印成一个列表。 在单词旁边，您应该 output 出现的次数。 例如：

最常出现的单词是 [“and”、“it”、“the”]，每个单词在文本中出现 10 次。

我有以下代码：

val counter: Map[String, Int] = scala.io.Source.fromFile(file).getLines
      .flatMap(_.split("[^-A-Za-z]+")).foldLeft(Map.empty[String, Int]) {
      (count, word) => count + (word.toLowerCase -> (count.getOrElse(word, 0) + 1))
    }
    val list = counter.toList.sortBy(_._2).reverse

这甚至可以按照出现的降序创建单词列表。 我不知道如何从这里开始。

Answer 1

好吧，你快到了......

   val maxNum = counter.headOption.fold(0)(_._2) // What's the max number?
   list
     .iterator // not necessary, but makes it a bit faster to perform chained transformations
     .takeWhile(_._2 == maxNum) // Get all words that have that count
     .map(_._1) // drop the counts, keep only words
     .foreach(println) // Print them out

正如评论中指出的那样，您的解决方案的一个主要问题是您不应该仅仅为了找到最大值而对列表进行排序。 做就是了

    val maxNum = counter.maxByOption(_._2).fold(0)(_._2)
    counter
     .iterator
     .collect { case (w, `maxNum`) => w }
     .foreach(println)

此外，使用groupMapReduce可以更优雅地完成您使用foldLeft完成的工作，对您的计数进行一些“美容”改进：

    val counter = source.getLines
        .flatMap("\\b") // \b is a regex symbol for "word boundary"
        .filter(_.contains("\\w")) // filter out the delimiters - you have a little bug here, that results in your counting spaces as "words"
        .groupMapReduce(identity)(_ => 1)(_ + _) // group data by word, replace each occurrence of a word with `1`, and add them all up

Scala中的函数式编程：Output文本文件中出现次数最多的单词（或单词列表）？

问题描述

1 个解决方案

解决方案1
1 2023-01-30 17:26:43

Scala中的函数式编程：Output文本文件中出现次数最多的单词（或单词列表）？

问题描述

1 个解决方案

解决方案1 1 2023-01-30 17:26:43

解决方案1
1 2023-01-30 17:26:43