Haskell 中的字數統計

Question

我正在嘗試計算文本中每個單詞的出現次數，然后將其表示為元組列表。

我嘗試過使用累加器，也嘗試過使用 concat 和過濾器。 出現的問題是我不確定如何處理列表中的列表。

我不確定如何從這里繼續，我嘗試在參數上使用 filter (x /=) 調用函數 wordCountt ，但由於某種原因無法運行。 真的很感激這里的一些指導。

干杯

type Document = [Sentence]
type WordTally = [(String, Int)]

wordCountt :: Document -> WordTally
wordCountt [] = []
wordCountt [(x:xs), ys] = [(x, length (filter (x ==) (concat [(x:xs), ys])))] ++ wordCountt [xs, ys]```



```wordCountt [["a", "rose", "is", "a", "rose"],["but", "so", "is", "a", "rose"]]
[("a",3),("rose",3),("is",2),("a",2),("rose",2)*** Exception: CompLing.hs:(60,1)-(61,100): Non-exhaustive patterns in function wordCountt```

Answer 1

我認為您試圖在單個功能中做太多事情。 這意味着該函數更難實現、調試，也許最重要的是，說服自己它有效。

我們可以將函數分為兩部分：

將給定元素添加到WordTally ； 和
一個枚舉文檔中所有單詞並不斷更新WordTally 。

更新函數將如下所示：

addWord :: WordTally -> String -> WordTally
addWord = …

因此函數需要一個WordTally和一個String 。 如果String ready 是 wordcount 的“成員”，則增加計數，否則我們將其加一。 您可以為此使用顯式遞歸。

然后wordCountt是一個折疊模式。 事實上，我們可以利用：

wordCountt :: Document -> WordTally
wordCountt d = foldl addWord [] (concat d)

或更短：

wordCountt :: Document -> WordTally
wordCountt = foldl addWord [] . concat

因此，我們從一個空列表作為WordTally ，每次從文檔的列表列表中添加一個元素並相應地更新WordTally ，直到我們到達單詞的末尾。

然而，這不會非常有效，因為更新WordTally列表將花費O(n)每個單詞，因此使其成為O(n ² )算法。 例如，您可以（稍后）查看Map ，它是一個可以在O(log n) 中插入/更新的容器。

Answer 2

為了制作直方圖，我一直喜歡Map.fromListWith ：

import qualified Data.Map.Strict as Map
import           Data.Map (Map)

histogram :: Ord a => [a] -> Map a Int
histogram xs = Map.fromListWith (+) (zip xs (repeat 1))

它的工作方式：

> zip (words "a rose is a rose") (repeat 1)
[("a",1),("rose",1),("is",1),("a",1),("rose",1)]

> Map.fromListWith (+) [("hello",1),("hello",1)]
fromList [("hello",2)]

> Map.fromListWith (+) [("a",1),("rose",1),("is",1),("a",1),("rose",1)]
fromList [("a",2),("is",1),("rose",2)]

> histogram (words "a rose is a rose")
fromList [("a",2),("is",1),("rose",2)]

所以當同一個詞出現在兩個 (word, count) 元組中時，計數會被+ 'ed。

Haskell 中的字數統計

問題描述

2 個解決方案

解決方案1
3 2020-11-18 18:58:54

解決方案2
3 2020-11-18 19:19:00

Haskell 中的字數統計

問題描述

2 個解決方案

解決方案1 3 2020-11-18 18:58:54

解決方案2 3 2020-11-18 19:19:00

解決方案1
3 2020-11-18 18:58:54

解決方案2
3 2020-11-18 19:19:00