使用Python刪除文本文件中包含字符或字母字符串的單詞

Question

我有幾行文字，想刪除其中帶有特殊字符或固定給定字符串的任何單詞（在python中）。

例：

in_lines = ['this is go:od', 
            'that example is bad', 
            'amp is a word']

# remove any word with {'amp', ':'}
out_lines = ['this is', 
             'that is bad', 
             'is a word']

我知道如何從給出的列表中刪除單詞，但是不能刪除帶有特殊字符或字母很少的單詞。 請讓我知道，我將添加更多信息。

這是我要刪除所選單詞的內容：

def remove_stop_words(lines):
   stop_words = ['am', 'is', 'are']
   results = []
   for text in lines:
        tmp = text.split(' ')
        for stop_word in stop_words:
            for x in range(0, len(tmp)):
               if tmp[x] == stop_word:
                  tmp[x] = ''
        results.append(" ".join(tmp))
   return results
out_lines = remove_stop_words(in_lines)

Answer 1

這符合您的預期輸出：

def remove_stop_words(lines):
  stop_words = ['am', ':']
  results = []
  for text in lines:
    tmp = text.split(' ')
    for x in range(0, len(tmp)):
      for st_w in stop_words:
        if st_w in tmp[x]:
          tmp[x] = ''
    results.append(" ".join(tmp))
  return results

Answer 2

in_lines = ['this is go:od', 
            'that example is bad', 
            'amp is a word']

def remove_words(in_list, bad_list):
    out_list = []
    for line in in_list:
        words = ' '.join([word for word in line.split() if not any([phrase in word for phrase in bad_list]) ])
        out_list.append(words)
    return out_list

out_lines = remove_words(in_lines, ['amp', ':'])
print (out_lines)

聽起來很奇怪，聲明

word for word in line.split() if not any([phrase in word for phrase in bad_list])

立即在這里完成所有艱苦的工作。 它為應用於單個單詞的“不良”列表中的每個短語創建一個True / False值列表。 any函數再次將此臨時列表壓縮為單個True / False值，如果為False則可以將該單詞安全地復制到基於行的輸出列表中。

例如，刪除所有包含a單詞的結果如下：

remove_words(in_lines, ['a'])
>>> ['this is go:od', 'is', 'is word']

（也可以for line in ..行中刪除for line in .. 。不過，此時，可讀性確實開始受到影響。）

使用Python刪除文本文件中包含字符或字母字符串的單詞

問題描述

2 個解決方案

解決方案1
1 2018-10-10 15:36:14

解決方案2
1 已采納 2018-10-10 15:37:40

使用Python刪除文本文件中包含字符或字母字符串的單詞

問題描述

2 個解決方案

解決方案1 1 2018-10-10 15:36:14

解決方案2 1 已采納 2018-10-10 15:37:40

解決方案1
1 2018-10-10 15:36:14

解決方案2
1 已采納 2018-10-10 15:37:40