[英]Removing words from list in python
我有一個列表“ abc”(字符串),並且我試圖從列表“ abc”中刪除列表“ stop”中存在的某些單詞以及abc中存在的所有數字。
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234']
stop=['attention', 'installed']
我正在使用列表推導將其刪除,但是下面的代碼無法刪除該單詞。
new_word=[word for word in abc if word not in stop ]
結果:(注意詞仍然存在。)
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'attention pleased',
'compliance installed']
所需的輸出:
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance']
謝謝
您需要過濾掉stop
單詞,然后將每個短語拆分為單詞,然后將單詞重新組合為短語。
[' '.join(w for w in p.split() if w not in stop) for p in abc]
輸出:
['issues in performance', 'how are you doing', 'hey my name is abc, what is your name', 'pleased', 'compliance installed']
只需要使用set
就可以解決這個問題。 因為您可能在每個項目中都包含多個單詞,所以您不能in
使用。 您應該將set
與&
結合使用以獲取公開字詞。 如果存在公共詞,並且您設置的stop
詞將返回True
。 因為您只關心其余部分,所以if not
這里,我們可以使用。
new_word=[word for word in abc if not set(word.split(' ')) & set(stop)]
更新
如果您還想刪除所有包含數字項,則只需執行以下操作即可:
new_word=[word for word in abc if not (set(word.split(' ')) & set(stop) or any([i.strip().isdigit() for i in word.split(' ')]))]
這是一個解決方案,將簡單的正則表達式與re.sub
方法配合使用。 此解決方案也會刪除數字。
import re
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234']
stop=['attention\s+', 'installed\s+', '[0-9]']
[(lambda x: re.sub(r'|'.join(stop), '', x))(x) for x in abc]
'Output':
['issues in performance ',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance ']
list1 = []
for word in abc:
word1 = ''
for remove_word in stop:
word1 = remove_word
word1 = word.replace(word1, '')
list1.append(word1)
這至少是我要做的:
abc=[ 'issues in performance 421',
'how are you doing',
'hey my name is abc, 143 what is your name',
'attention pleased',
'compliance installed 234'
]
stop=['attention', 'installed']
for x, elem in enumerate(abc):
abc[x] = " ".join(filter(lambda x: x not in stop and not x.isdigit(), elem.split()))
print(abc)
結果:
['issues in performance',
'how are you doing',
'hey my name is abc, what is your name',
'pleased',
'compliance']
希望它能以任何方式幫助您:)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.