[英]Removing words in text files containing a character or string of letters with Python
我有幾行文字,想刪除其中帶有特殊字符或固定給定字符串的任何單詞(在python中)。
例:
in_lines = ['this is go:od',
'that example is bad',
'amp is a word']
# remove any word with {'amp', ':'}
out_lines = ['this is',
'that is bad',
'is a word']
我知道如何從給出的列表中刪除單詞,但是不能刪除帶有特殊字符或字母很少的單詞。 請讓我知道,我將添加更多信息。
這是我要刪除所選單詞的內容:
def remove_stop_words(lines):
stop_words = ['am', 'is', 'are']
results = []
for text in lines:
tmp = text.split(' ')
for stop_word in stop_words:
for x in range(0, len(tmp)):
if tmp[x] == stop_word:
tmp[x] = ''
results.append(" ".join(tmp))
return results
out_lines = remove_stop_words(in_lines)
這符合您的預期輸出:
def remove_stop_words(lines):
stop_words = ['am', ':']
results = []
for text in lines:
tmp = text.split(' ')
for x in range(0, len(tmp)):
for st_w in stop_words:
if st_w in tmp[x]:
tmp[x] = ''
results.append(" ".join(tmp))
return results
in_lines = ['this is go:od',
'that example is bad',
'amp is a word']
def remove_words(in_list, bad_list):
out_list = []
for line in in_list:
words = ' '.join([word for word in line.split() if not any([phrase in word for phrase in bad_list]) ])
out_list.append(words)
return out_list
out_lines = remove_words(in_lines, ['amp', ':'])
print (out_lines)
聽起來很奇怪,聲明
word for word in line.split() if not any([phrase in word for phrase in bad_list])
立即在這里完成所有艱苦的工作。 它為應用於單個單詞的“不良”列表中的每個短語創建一個True
/ False
值列表。 any
函數再次將此臨時列表壓縮為單個True
/ False
值,如果為False
則可以將該單詞安全地復制到基於行的輸出列表中。
例如,刪除所有包含a
單詞的結果如下:
remove_words(in_lines, ['a'])
>>> ['this is go:od', 'is', 'is word']
(也可以for line in ..
行中刪除for line in ..
。不過,此時,可讀性確實開始受到影響。)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.