類型錯誤：不可哈希類型：使用Python字符串集時的列表

Question

我知道在這個確切的問題上，有幾個非常相似的答案，但是沒有一個能真正回答我的問題。

我正在嘗試從單詞列表中刪除一系列停用詞和標點符號，以執行基本的自然語言處理。

from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from string import punctuation


    text = "Hello there. I am currently typing Python. "
    custom_stopwords = set(stopwords.words('english')+list(punctuation))

    # tokenizes the text into a sentence
    sentences = sent_tokenize(text)

    # tokenizes each sentence into a list of words
    words = [word_tokenize(sentence) for sentence in sentences]
    filtered_words = [word for word in words if word not in custom_stopwords]
    print(filtered_words)

這在filtered_words行上引發TypeError: unhashable type: 'list'錯誤。 為什么會引發此錯誤？ 我根本不提供list集合，而是提供set嗎？

注意：我已經閱讀了有關此確切錯誤的SO文章，但仍然有相同的問題。 接受的答案提供了以下解釋：

集要求其項是可哈希的。 在Python預定義的類型中，只有不可變的類型（例如字符串，數字和元組）是可哈希的。 可變類型（例如列表和字典）不可散列，因為更改其內容將更改散列並破壞查找代碼。

我在這里提供了一組字符串，為什么Python仍在抱怨？

編輯：在閱讀更多關於SO的文章（建議使用tuples ，我編輯了集合對象：

custom_stopwords = tuple(stopwords.words('english'))

我還意識到我必須弄平我的列表，因為word_tokenize(sentence)將創建一個列表列表，並且不會正確濾除標點符號（因為列表對象將不在custom_stopwords ，即字符串列表）。

然而，這仍然引出一個問題：為什么Python認為元組可以哈希，而字符串集卻不能？ 為什么TypeError說list ？

Answer 1

words是列表的列表，因為word_tokenize()返回單詞列表。

當您執行[word for word in words if word not in custom_stopwords]每個word實際上都是list類型。 當需要檢查word not in custom_stopwords的word not in custom_stopwords “處於設置狀態”時，需要對word進行哈希處理，這會失敗，因為列表是可變容器，在Python中不可哈希。

這些帖子可能有助於了解什么是“可哈希”以及為什么可變容器不可行：

類型錯誤：不可哈希類型：使用Python字符串集時的列表

問題描述

1 個解決方案

解決方案1
4 已采納 2017-12-26 18:08:37

類型錯誤：不可哈希類型：使用Python字符串集時的列表

問題描述

1 個解決方案

解決方案1 4 已采納 2017-12-26 18:08:37

解決方案1
4 已采納 2017-12-26 18:08:37