类型错误：不可哈希类型：使用Python字符串集时的列表

Question

我知道在这个确切的问题上，有几个非常相似的答案，但是没有一个能真正回答我的问题。

我正在尝试从单词列表中删除一系列停用词和标点符号，以执行基本的自然语言处理。

from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from string import punctuation


    text = "Hello there. I am currently typing Python. "
    custom_stopwords = set(stopwords.words('english')+list(punctuation))

    # tokenizes the text into a sentence
    sentences = sent_tokenize(text)

    # tokenizes each sentence into a list of words
    words = [word_tokenize(sentence) for sentence in sentences]
    filtered_words = [word for word in words if word not in custom_stopwords]
    print(filtered_words)

这在filtered_words行上引发TypeError: unhashable type: 'list'错误。 为什么会引发此错误？ 我根本不提供list集合，而是提供set吗？

注意：我已经阅读了有关此确切错误的SO文章，但仍然有相同的问题。 接受的答案提供了以下解释：

集要求其项是可哈希的。 在Python预定义的类型中，只有不可变的类型（例如字符串，数字和元组）是可哈希的。 可变类型（例如列表和字典）不可散列，因为更改其内容将更改散列并破坏查找代码。

我在这里提供了一组字符串，为什么Python仍在抱怨？

编辑：在阅读更多关于SO的文章（建议使用tuples ，我编辑了集合对象：

custom_stopwords = tuple(stopwords.words('english'))

我还意识到我必须弄平我的列表，因为word_tokenize(sentence)将创建一个列表列表，并且不会正确滤除标点符号（因为列表对象将不在custom_stopwords ，即字符串列表）。

然而，这仍然引出一个问题：为什么Python认为元组可以哈希，而字符串集却不能？ 为什么TypeError说list ？

Answer 1

words是列表的列表，因为word_tokenize()返回单词列表。

当您执行[word for word in words if word not in custom_stopwords]每个word实际上都是list类型。 当需要检查word not in custom_stopwords的word not in custom_stopwords “处于设置状态”时，需要对word进行哈希处理，这会失败，因为列表是可变容器，在Python中不可哈希。

这些帖子可能有助于了解什么是“可哈希”以及为什么可变容器不可行：

类型错误：不可哈希类型：使用Python字符串集时的列表

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-12-26 18:08:37

类型错误：不可哈希类型：使用Python字符串集时的列表

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-12-26 18:08:37

解决方案1
4 已采纳 2017-12-26 18:08:37