使用Python删除文本文件中包含字符或字母字符串的单词

Question

I have a few lines of text and want to remove any word with special characters or a fixed given string in them (in python). 我有几行文字，想删除其中带有特殊字符或固定给定字符串的任何单词（在python中）。

Example: 例：

in_lines = ['this is go:od', 
            'that example is bad', 
            'amp is a word']

# remove any word with {'amp', ':'}
out_lines = ['this is', 
             'that is bad', 
             'is a word']

I know how to remove words from a list that is given but cannot remove words with special characters or few letters being present. 我知道如何从给出的列表中删除单词，但是不能删除带有特殊字符或字母很少的单词。 Please let me know and I'll add more information. 请让我知道，我将添加更多信息。

This is what I have for removing selected words: 这是我要删除所选单词的内容：

def remove_stop_words(lines):
   stop_words = ['am', 'is', 'are']
   results = []
   for text in lines:
        tmp = text.split(' ')
        for stop_word in stop_words:
            for x in range(0, len(tmp)):
               if tmp[x] == stop_word:
                  tmp[x] = ''
        results.append(" ".join(tmp))
   return results
out_lines = remove_stop_words(in_lines)

Answer 1

This matches your expected output: 这符合您的预期输出：

def remove_stop_words(lines):
  stop_words = ['am', ':']
  results = []
  for text in lines:
    tmp = text.split(' ')
    for x in range(0, len(tmp)):
      for st_w in stop_words:
        if st_w in tmp[x]:
          tmp[x] = ''
    results.append(" ".join(tmp))
  return results

Answer 2

in_lines = ['this is go:od', 
            'that example is bad', 
            'amp is a word']

def remove_words(in_list, bad_list):
    out_list = []
    for line in in_list:
        words = ' '.join([word for word in line.split() if not any([phrase in word for phrase in bad_list]) ])
        out_list.append(words)
    return out_list

out_lines = remove_words(in_lines, ['amp', ':'])
print (out_lines)

Strange as it sounds, the statement 听起来很奇怪，声明

word for word in line.split() if not any([phrase in word for phrase in bad_list])

does all the hard work here at once. 立即在这里完成所有艰苦的工作。 It creates a list of True / False values for each phrase in the "bad" list applied to a single word. 它为应用于单个单词的“不良”列表中的每个短语创建一个True / False值列表。 The any function condenses this temporary list into a single True / False value again, and if this is False then the word can safely be copied into the line-based output list. any函数再次将此临时列表压缩为单个True / False值，如果为False则可以将该单词安全地复制到基于行的输出列表中。

As an example, the result of removing all words containing an a looks like this: 例如，删除所有包含a单词的结果如下：

remove_words(in_lines, ['a'])
>>> ['this is go:od', 'is', 'is word']

(It is possible to remove the for line in .. line as well. At that point, readability really starts to suffer, though.) （也可以for line in ..行中删除for line in .. 。不过，此时，可读性确实开始受到影响。）

使用Python删除文本文件中包含字符或字母字符串的单词

问题描述

2 个解决方案

解决方案1
1 2018-10-10 15:36:14

解决方案2
1 已采纳 2018-10-10 15:37:40

使用Python删除文本文件中包含字符或字母字符串的单词

问题描述

2 个解决方案

解决方案1 1 2018-10-10 15:36:14

解决方案2 1 已采纳 2018-10-10 15:37:40

解决方案1
1 2018-10-10 15:36:14

解决方案2
1 已采纳 2018-10-10 15:37:40