从列表中删除带有自定义停用词的短语

Question

I have two lists 我有两个清单

listA = ['New Delhi', 'Moscow', 'Berlin', 'France', 'To Washington']
stopwordlist = ['new', 'To']

I am trying to get something like this 我正在尝试得到这样的东西

finalList = ['Moscow', 'Berlin', 'France']

What I have tried until now works if I am looking for whole words: 如果我一直在寻找完整的单词，那么直到现在我尝试过的东西仍然有效：

listB = []
for item in listA:
    if item not in stopwordlist:
        listB.append(item)
    else:
        continue
....            
....
    return listB

We may split the item then check those in the stopwordlist. 我们可以拆分item然后检查停用词列表中的项目。 But this seems like to many workarounds. 但这似乎是许多解决方法。 Or I could use a regex re.match . 或者我可以使用正则表达式re.match 。

Answer 1

Here is one way to do this, 这是一种方法，

>>> listA = ['New Delhi', 'Moscow', 'Berlin', 'France', 'To Washington']
>>> stopwordlist = ['new', 'To']
>>> finalList = [i for i in listA if not any(j.lower() in i.lower() for j in stopwordlist)]
>>> finalList
['Moscow', 'Berlin', 'France']

or You could use the builtin filter function. 或者您可以使用内置的filter功能。

>>> listA = ['New Delhi', 'Moscow', 'Berlin', 'France', 'To Washington']
>>> stopwordlist = ['new', 'To']
>>> list(filter(lambda x: not any(j.lower() in x.lower() for j in stopwordlist), listA))
['Moscow', 'Berlin', 'France']

Answer 2

sl = tuple(i.lower() for i in stopwordlist)
[i for i in listA if not i.lower().startswith(sl)]

Output 产量

['Moscow', 'Berlin', 'France']

Answer 3

listA =['New Delhi','Moscow', 'Berlin','France', 'To Washington']
stopwordlist = ['new','To']
listA = [i.lower() for i in listA]
stopwordlist = [i.lower() for i in stopwordlist]

listB =[]

for item in listA:
    flag = True
    for i in item.split(' '):
        if i in stopwordlist:
            flag =False
    if flag:
        listB.append(item)
print(listB)

Answer 4

you have to lower your stop words also the words against you run you stopwords: 您必须降低停用词，也要降低针对您的词使您停用词：

listA = ['New Delhi', 'Moscow', 'Berlin', 'France', 'To Washington']
stopwordlist = ['new', 'To']

stop_words = {e.lower() for e in stopwordlist}
finalList = [e for e in listA if not stop_words.intersection(e.lower().split())]

or you can use regex: 或者您可以使用正则表达式：

import regex as re

stop_words_regex = re.compile(r"\L<words>", words=stop_words)
finalList = [e for  e in listA if not stop_words_regex.findall(e.lower())]

Output: 输出：

['Moscow', 'Berlin', 'France']

从列表中删除带有自定义停用词的短语

问题描述

4 个解决方案

解决方案1
2 已采纳 2019-09-06 06:46:59

解决方案2
2 2019-09-06 06:47:53

解决方案3
1 2019-09-06 09:08:38

解决方案4
0 2019-09-06 06:56:13

从列表中删除带有自定义停用词的短语

问题描述

4 个解决方案

解决方案1 2 已采纳 2019-09-06 06:46:59

解决方案2 2 2019-09-06 06:47:53

解决方案3 1 2019-09-06 09:08:38

解决方案4 0 2019-09-06 06:56:13

解决方案1
2 已采纳 2019-09-06 06:46:59

解决方案2
2 2019-09-06 06:47:53

解决方案3
1 2019-09-06 09:08:38

解决方案4
0 2019-09-06 06:56:13