繁体 English 中英

NLTK止动词列表

[英]NLTK Stopword List

原文 2014-03-31 13:45:24 0 1 python/ nltk/ stop-words

我有下面的代码，我试图将停用词列表应用于单词列表。 然而，结果仍然显示“a”和“the”这样的词，我认为这个词会被这个过程删除。 任何出错的想法都会很棒。

import nltk
from nltk.corpus import stopwords

word_list = open("xxx.y.txt", "r")
filtered_words = [w for w in word_list if not w in stopwords.words('english')]
print filtered_words

1 个解决方案

一些值得注意的事情。

如果您要反复检查列表中的成员资格，我会使用集合而不是列表。
stopwords.words('english')返回一个小写停用词列表。 您的来源很可能包含大写字母，因此不匹配。
您没有正确读取文件，您正在检查文件对象而不是按空格分割的单词列表。

把它们放在一起：

import nltk
from nltk.corpus import stopwords

word_list = open("xxx.y.txt", "r")
stops = set(stopwords.words('english'))

for line in word_list:
    for w in line.split():
        if w.lower() not in stops:
            print w

NLTK停用词删除问题

[英]NLTK stopword removal issue

使用NLTK和Pandas删除停用词

[英]Stopword removal with NLTK and Pandas

使用 NLTK 去除停用词

[英]Stopword removal with NLTK

如何从NLTK扩展禁用词列表并删除带有扩展列表的停用词？

[英]How to extend the stopword list from NLTK and remove stop words with the extended list?

python 列表中的停用词删除

[英]stopword removal in python list

Sklearn - 如何从 txt 文件添加自定义停用词列表

[英]Sklearn - How to add custom stopword list from txt file

转换NLTK LazySubsequence为列表

[英]Convert NLTK LazySubsequence to a list

NLTK 的额外缩写列表？

[英]List of extra abbreviations for NLTK?

语法nltk在Python中的列表

[英]Grammar nltk for list in Python

nltk 不识别列表

[英]nltk do not recognize list

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 NLTK停用词删除问题使用NLTK和Pandas删除停用词使用 NLTK 去除停用词如何从NLTK扩展禁用词列表并删除带有扩展列表的停用词？ python 列表中的停用词删除 Sklearn - 如何从 txt 文件添加自定义停用词列表转换NLTK LazySubsequence为列表 NLTK 的额外缩写列表？语法nltk在Python中的列表 nltk 不识别列表

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM