簡體   English   中英

從Python列表中刪除單詞

[英]Removing words from list in python

我有一個列表“ abc”(字符串),並且我試圖從列表“ abc”中刪除列表“ stop”中存在的某些單詞以及abc中存在的所有數字。

abc=[ 'issues in performance 421',
 'how are you doing',
 'hey my name is abc, 143 what is your name',
 'attention pleased',
 'compliance installed 234']
stop=['attention', 'installed']

我正在使用列表推導將其刪除,但是下面的代碼無法刪除該單詞。

new_word=[word for word in abc if word not in stop ]

結果:(注意詞仍然存在。)

['issues in performance',
 'how are you doing',
 'hey my name is abc, what is your name',
 'attention pleased',
 'compliance installed']

所需的輸出:

 ['issues in performance',
     'how are you doing',
     'hey my name is abc, what is your name',
     'pleased',
     'compliance']

謝謝

您需要過濾掉stop單詞,然后將每個短語拆分為單詞,然后將單詞重新組合為短語。

[' '.join(w for w in p.split() if w not in stop) for p in abc]

輸出:

['issues in performance', 'how are you doing', 'hey my name is abc, what is your name', 'pleased', 'compliance installed']

只需要使用set就可以解決這個問題。 因為您可能在每個項目中都包含多個單詞,所以您不能in使用。 您應該將set&結合使用以獲取公開字詞。 如果存在公共詞,並且您設置的stop詞將返回True 因為您只關心其余部分,所以if not這里,我們可以使用。

new_word=[word for word in abc if  not set(word.split(' ')) & set(stop)]

更新

如果您還想刪除所有包含數字項,則只需執行以下操作即可:

new_word=[word for word in abc if  not (set(word.split(' ')) & set(stop) or any([i.strip().isdigit() for i in word.split(' ')]))]

這是一個解決方案,將簡單的正則表達式與re.sub方法配合使用。 此解決方案也會刪除數字。

import re

abc=[ 'issues in performance 421',
 'how are you doing',
 'hey my name is abc, 143 what is your name',
 'attention pleased',
 'compliance installed 234']
stop=['attention\s+', 'installed\s+', '[0-9]']

[(lambda x: re.sub(r'|'.join(stop), '', x))(x) for x in abc]


'Output':
['issues in performance ',
'how are you doing',
 'hey my name is abc,  what is your name',
 'pleased',
 'compliance ']
list1 = []
for word in abc:
    word1 = ''
    for remove_word in stop:
        word1 = remove_word
        word1 = word.replace(word1, '')
    list1.append(word1)

這至少是我要做的:

abc=[ 'issues in performance 421',
    'how are you doing',
    'hey my name is abc, 143 what is your name',
    'attention pleased',
    'compliance installed 234'
]
stop=['attention', 'installed']
for x, elem in enumerate(abc):
    abc[x] = " ".join(filter(lambda x: x not in stop and not x.isdigit(), elem.split()))
print(abc)

結果:

['issues in performance',
    'how are you doing',
    'hey my name is abc, what is your name',
    'pleased',
    'compliance']

希望它能以任何方式幫助您:)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM