無法散列的類型“列表”-字數統計

Question

corpus = PlaintextCorpusReader("path",'.*',encoding="latin1")
docs = [corpus.words(f)for f in corpus.fileids()]
docs2 = [[w.lower()for w in doc]for doc in docs]
docs3 = [[w for w in doc if re.search('^[a-z]+$', w)]for doc in docs2]
from nltk.corpus import stopwords
stop_list = stopwords.words('english')

docs4 = [[w for w in doc if w not in stop_list]for doc in docs3]

我編寫了以下代碼，該代碼讀取了一個文件集。 因此，我已經完成了一些預處理步驟，例如刪除標點符號，停用詞等。現在，我想進行字數統計並查找文本中最常用的字詞。 我使用下面的代碼來做到這一點。 對於docs4中的單詞：

if word in word_counter:
    word_counter[word] += 1
else:
    word_counter[word] = 1

popular_words = sorted(word_counter, key = word_counter.get, reverse = True)

但是我得到以下錯誤。 -

Traceback (most recent call last):
  File "C:/Users/rohanhm.2014/PycharmProjects/untitled1/bp.py", line 18, in <module>
    if word in word_counter:
TypeError: unhashable type: 'list'

有什么建議么？

Answer 1

我認為“單詞”是列表類型。 也許您通過使用僅在其中包含一個字符串的列表來犯錯，但是您認為它是字符串類型。

Answer 2

由於w word_counter是一個多列表，因此無法散列。您可以編寫此代碼

   from itertools import chain
   print list(chain(*l))

Answer 3

有一種簡便的方法可以確定nltk中文本的流行詞。

>>> import nltk
>>> words = ['a','b','a','a','b','c','d']
>>> fd = nltk.FreqDist(words)
>>> fd.most_common(3)
[('a', 3), ('b', 2), ('c', 1)]

無法散列的類型“列表”-字數統計

問題描述

3 個解決方案

解決方案1
0 2015-03-11 06:00:53

解決方案2
0 2015-03-11 06:03:14

解決方案3
0 2015-03-11 09:19:10

無法散列的類型“列表”-字數統計

問題描述

3 個解決方案

解決方案1 0 2015-03-11 06:00:53

解決方案2 0 2015-03-11 06:03:14

解決方案3 0 2015-03-11 09:19:10

解決方案1
0 2015-03-11 06:00:53

解決方案2
0 2015-03-11 06:03:14

解決方案3
0 2015-03-11 09:19:10