Scala中的函數式編程：Output文本文件中出現次數最多的單詞（或單詞列表）？

Question

Output 在文本文件中出現次數最多的單詞（或單詞列表）（不考慮大小寫——即“單詞”和“單詞”為此目的被視為相同）。 我們只對包含字母字符 [AZ az] 的單詞感興趣，因此忽略任何數字（數字）、標點符號等。

如果有幾個詞出現頻率最高，那么所有這些詞都應該打印成一個列表。 在單詞旁邊，您應該 output 出現的次數。 例如：

最常出現的單詞是 [“and”、“it”、“the”]，每個單詞在文本中出現 10 次。

我有以下代碼：

val counter: Map[String, Int] = scala.io.Source.fromFile(file).getLines
      .flatMap(_.split("[^-A-Za-z]+")).foldLeft(Map.empty[String, Int]) {
      (count, word) => count + (word.toLowerCase -> (count.getOrElse(word, 0) + 1))
    }
    val list = counter.toList.sortBy(_._2).reverse

這甚至可以按照出現的降序創建單詞列表。 我不知道如何從這里開始。

Answer 1

好吧，你快到了......

   val maxNum = counter.headOption.fold(0)(_._2) // What's the max number?
   list
     .iterator // not necessary, but makes it a bit faster to perform chained transformations
     .takeWhile(_._2 == maxNum) // Get all words that have that count
     .map(_._1) // drop the counts, keep only words
     .foreach(println) // Print them out

正如評論中指出的那樣，您的解決方案的一個主要問題是您不應該僅僅為了找到最大值而對列表進行排序。 做就是了

    val maxNum = counter.maxByOption(_._2).fold(0)(_._2)
    counter
     .iterator
     .collect { case (w, `maxNum`) => w }
     .foreach(println)

此外，使用groupMapReduce可以更優雅地完成您使用foldLeft完成的工作，對您的計數進行一些“美容”改進：

    val counter = source.getLines
        .flatMap("\\b") // \b is a regex symbol for "word boundary"
        .filter(_.contains("\\w")) // filter out the delimiters - you have a little bug here, that results in your counting spaces as "words"
        .groupMapReduce(identity)(_ => 1)(_ + _) // group data by word, replace each occurrence of a word with `1`, and add them all up

Scala中的函數式編程：Output文本文件中出現次數最多的單詞（或單詞列表）？

問題描述

1 個解決方案

解決方案1
1 2023-01-30 17:26:43

Scala中的函數式編程：Output文本文件中出現次數最多的單詞（或單詞列表）？

問題描述

1 個解決方案

解決方案1 1 2023-01-30 17:26:43

解決方案1
1 2023-01-30 17:26:43