簡體   English   中英

TypeError: unhashable type: 'list' for nltk.FreqDist 即使我已將列表轉換為元組

[英]TypeError: unhashable type: 'list' for nltk.FreqDist even though I've converted my list into a tuple

我確實有以下代碼:

import nltk

grams = tuple(i for i in tri_grams)
print(type(grams))
bigram_fd = nltk.FreqDist(grams)
bigram_fd.most_common()

並出現以下錯誤

<class 'tuple'>    
TypeError                                 Traceback (most recent call last)
<ipython-input-200-4809d6a29102> in <module>
      3 grams = tuple(i for i in tri_grams)
      4 print(type(grams))
----> 5 bigram_fd = nltk.FreqDist(grams)
      6 # bigram_fd = nltk.FreqDist(nltk.bigrams(ngrams))
      7 

c:\Users\Nauel\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\probability.py in __init__(self, samples)
    100         :type samples: Sequence
    101         """
--> 102         Counter.__init__(self, samples)
    103 
    104         # Cached number of samples in this FreqDist

c:\Users\Nauel\AppData\Local\Programs\Python\Python36\lib\collections\__init__.py in __init__(*args, **kwds)
    533             raise TypeError('expected at most 1 arguments, got %d' % len(args))
    534         super(Counter, self).__init__()
--> 535         self.update(*args, **kwds)
    536 
    537     def __missing__(self, key):

c:\Users\Nauel\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\probability.py in update(self, *args, **kwargs)
    138         """
...
--> 622                 _count_elements(self, iterable)
    623         if kwds:
    624             self.update(kwds)

TypeError: unhashable type: 'list'

那么我的代碼有什么問題? 我已將列表轉換為元組,但FreqDist無法識別它。 我希望我已經清楚了,謝謝! :)

PS = 我的tri_grams看起來像這樣:

[['potere_crescere', 'molto_vs', 'decentraland_mano', 'can_grow', 'lot_vs'], ['potere_crescere', 'molto_vs', 'decentraland_mano', 'can_grow', 'lot_vs'], ['certo', 'no', 'essere', 'sempre', 'gente', 'innocente', 'pagare', 'prezzo', 'storia', 'Balcani', 'essere', 'molto', 'complesso', 'essere', 'incrocio', 'interesse', 'misto', 'cultura', 'nazione', 'religione', 'gente', 'testardo', 'orgoglioso', 'difficile', 'gestire']]

我完全改變了這樣做的方式,現在它正在工作,它可能對使用 n-gram 的人有用:

bi_grams = []
bigram = gensim.models.phrases.Phrases(df['stopwords'], min_count=1, threshold=10)
vector = bigram[df['stopwords']]
for t in vector:
    bi_grams.append(t)

frequencies = []
pattern = re.compile(r'_')
item = str("_")
for i in range(len(bi_grams)):
    for j in range(len(bi_grams[i])):
        if item in bi_grams[i][j]:
            frequencies.append(bi_grams[i][j])
from collections import Counter
frequencies = str(frequencies)

split_it = frequencies.split(", ")
Counter = Counter(split_it)
most_occur = Counter.most_common(100)
most_occur

然后輸出:

[("'offerta_invece'", 78),
 ("'️_recensione'", 60),
 ("'prezzo_precedente'", 51),
 ("'prezzo_attuale'", 50),
 ("'stare_risparmiare'", 49),
 ("'prezzo_scontare'", 36),
 ("'️offertare_amazon️'", 31),
 ("'offerta_sconto'", 30),
 ("'risparmio_acquistare'", 30),
 ("'ora_storico'", 30),
 ("'solo_invece'", 26),
 ("'prezzo_ridurre'", 23),
 ("' re_attenzione'", 22),
...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM