简体   繁体   English

从字符串中删除短语列表

[英]Remove list of phrases from string

I have an array of phrases: 我有一系列的短语:

bannedWords = ['hi', 'hi you', 'hello', 'and you']

I want to take a sentence like "hi, how are tim and you doing" and get this: 我想说一句“嗨,蒂姆,你好吗”,并得到以下信息:

", how are tim doing"

Exact case matching is OK - sorry, should have clarified. 精确的大小写匹配是可以的-抱歉,应该弄清楚。

Since you want to remove extra spaces as well, the regex below should work better: 由于您也想删除多余的空格,因此下面的正则表达式应该可以更好地工作:

s = "Hi, How are Tim and you doing"
bannedWords = ['hi', 'hi you', 'hello', 'and you']
for i in bannedWords: 
    s = re.sub(i + "\s*", '', s, flags = re.I)
print s
# ', How are Tim doing'

You can use re.sub with a flag to do this in a case insensitive manner. 您可以将re.sub与标志一起使用,以不区分大小写的方式进行。

import re

bannedWords = ['hi', 'hi you', 'hello', 'and you']
sentence = "Hi, how are Tim and you doing"

new_sentence = re.sub('|'.join(bannedWords) + r'\s+', '', sentence, flags=re.I)
# new_sentence: ", how are Tim doing"

With regex you can join words you want to remove with |. 使用正则表达式,您可以使用|将要删除的单词连接起来。 We also want to remove any multiple blankspace with one blankspace. 我们还想删除带有一个空格的任何多个空格。 This ensures we only do two operations. 这样可以确保我们仅执行两项操作。

import re

def remove_banned(s,words):
    pattern = '|'.join(words)
    s = re.sub(pattern, '', s, flags = re.I) # remove words
    s = re.sub('\s+', ' ', s, flags = re.I) # remove extra blank space'
    return s

bannedWords = ['hi', 'hi you', 'hello', 'and you']
s = "Hi, How are Tim and you doing"

print(remove_banned(s,bannedWords))

Returns: 返回值:

, How are Tim doing

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM