简体   繁体   English

删除Python中的停用词

[英]Removing Stopwords in Python

I'm trying to remove the stopwords from a user input string using the .join function. 我正在尝试使用.join函数从用户输入字符串中删除停用词。 It looks like this: 它看起来像这样:

while True:
    line = raw_input()
    if line.strip() == stopword:
        break
    remove_stopwords = ''.join(word for word in line.split() if word not in stop_words)

I've defined stop_words in a list at the top. 我在顶部的列表中定义了stop_words The problem is that when I type in the string for the stop words to be removed from, it only removes the first word and leaves the rest. 问题是当我输入要删除的停用词的字符串时,它只会删除第一个单词而剩下的就剩下了。 Any help would be great. 任何帮助都会很棒。 I'm new to this so it's probably something stupid. 我是新手,所以这可能是愚蠢的。

Here is a one liner using the filter function: 这是使用filter功能的单线程:

" ".join(filter(lambda word: word not in stop_words, line.split()))

Additionally, consider storing your stop words in a set rather than a list . 另外,考虑将停用词存储在一个set而不是list The average algorithmic complexity of the search operation ( in ) is constant for a set and linear for a list . 搜索操作( in )的平均算法复杂度对于set是恒定的,对于list线性的。

Edit: Your program appears to be working as expected with an additional space for the join string. 编辑:您的程序似乎正在按预期工作,并为join字符串添加了额外的空间。 This makes sense as (x for x in y if f(x)) is roughly equivalent to filter : 这是有道理的(x for x in y if f(x))(x for x in y if f(x))大致等于filter

  stop_words = set(["hi", "bye"])
  stopword = "DONE"
  while True:
      line = raw_input()
      if line.strip() == stopword:
          break
      print(" ".join(word for word in line.split() if word not in stop_words))

input: 输入:

hello hi my name is bye justin

output: 输出:

hello my name is justin

Your bug must be somewhere else in your program. 您的错误必须在您的程序中的其他位置。 What else are you doing? 你还在做什么?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM