[英]Removing words from list in python
I have a list 'abc' (strings) and I am trying to remove some words present in list 'stop' from the list 'abc' and all the digits present in abc. 我有一个列表“ abc”(字符串),并且我试图从列表“ abc”中删除列表“ stop”中存在的某些单词以及abc中存在的所有数字。
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234']
stop=['attention', 'installed']
I am using list comprehension to remove it but this below code is not able to remove that word. 我正在使用列表推导将其删除,但是下面的代码无法删除该单词。
new_word=[word for word in abc if word not in stop ]
Result:(attention word is still present.) 结果:(注意词仍然存在。)
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'attention pleased',
'compliance installed']
Desired output: 所需的输出:
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance']
Thanks 谢谢
You need to split each phrase into words and re-join the words into phrases after filtering out those in stop
. 您需要过滤掉stop
单词,然后将每个短语拆分为单词,然后将单词重新组合为短语。
[' '.join(w for w in p.split() if w not in stop) for p in abc]
This outputs: 输出:
['issues in performance', 'how are you doing', 'hey my name is abc, what is your name', 'pleased', 'compliance installed']
It's just need to use set
will good to this question. 只需要使用set
就可以解决这个问题。 Because you maybe have more than one word at each item, so you can't use in
. 因为您可能在每个项目中都包含多个单词,所以您不能in
使用。 you should use set
with &
to get the public word. 您应该将set
与&
结合使用以获取公开字词。 If it's exists public word with your stop
set will return True
. 如果存在公共词,并且您设置的stop
词将返回True
。 Because you only care about the rest part , so we can use if not
here. 因为您只关心其余部分,所以if not
这里,我们可以使用。
new_word=[word for word in abc if not set(word.split(' ')) & set(stop)]
UPDATE 更新
If you also want to delete all include digit item, you just simple do it with the following : 如果您还想删除所有包含数字项,则只需执行以下操作即可:
new_word=[word for word in abc if not (set(word.split(' ')) & set(stop) or any([i.strip().isdigit() for i in word.split(' ')]))]
Here is a solution, using simple regular expression with the re.sub
method. 这是一个解决方案,将简单的正则表达式与re.sub
方法配合使用。 This solution removes numbers as well. 此解决方案也会删除数字。
import re
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234']
stop=['attention\s+', 'installed\s+', '[0-9]']
[(lambda x: re.sub(r'|'.join(stop), '', x))(x) for x in abc]
'Output':
['issues in performance ',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance ']
list1 = []
for word in abc:
word1 = ''
for remove_word in stop:
word1 = remove_word
word1 = word.replace(word1, '')
list1.append(word1)
This is how I'd do it at least: 这至少是我要做的:
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234'
]
stop=['attention', 'installed']
for x, elem in enumerate(abc):
abc[x] = " ".join(filter(lambda x: x not in stop and not x.isdigit(), elem.split()))
print(abc)
result: 结果:
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance']
Hope it helps in any way :) 希望它能以任何方式帮助您:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.