[英]How to remove WHOLE words from a string in python?
I'm trying to make a function to remove whole words from a string in python, and I think I have something that does it:我正在尝试制作一个 function 来从 python 中的字符串中删除整个单词,我想我可以做到这一点:
def remove_words_from_str(strn, word, replacement=' '):
return re.sub(r'(\s*)'+word+'(\s*)', replacement, strn)
The problem is this takes pieces of words too, which I don't want.问题是这也需要一些单词,这是我不想要的。
EX: print( remove_words_from_str( "is this is a test ? yes this is ; this is", "is" ) )
OUT: th a test ? yes th ; th
Is there a way to only take whole words?有没有办法只取整个单词? (In other words, I don't want 'this' to go to 'th', cause the 'is' in 'this' is not a full word)
(换句话说,我不希望 go 到 'th' 的 'this',因为 'this' 中的 'is' 不是一个完整的单词)
Python regex supports a \b
symbol, which means "word" boundary. Python 正则表达式支持
\b
符号,表示“单词”边界。 So you can do所以你可以做
re.sub(r'\s*\b' + word + r'\b\s*', replacement, strn)
You will still want to keep the greedy \s*
quantifiers on both sides to replace all the surrounding spaces with a single space.您仍然希望在两侧保留贪婪
\s*
量词,以用单个空格替换所有周围的空格。
The output for your test case is您的测试用例的 output 是
' this a test ? yes this ; this '
If you want to ensure that the first and last space are removed, use str.strip
on the result:如果要确保删除第一个和最后一个空格,请在结果上使用
str.strip
:
def remove_words_from_str(strn, word, replacement=' '):
return re.sub(r'\s*\b' + word + r'\b\s*', replacement, strn).strip()
This worked for me.这对我有用。
def remove_words_from_str(strn, word, replacement=' '):
return re.sub(r'(^|\s+)'+word+'($|\s+)', replacement, strn)
You could use the .split()
method on the list to break it down into single words (splits at blanks if no argument is given).您可以使用列表中的
.split()
方法将其分解为单个单词(如果没有给出参数,则在空格处拆分)。 And then simply go with然后只需 go 与
list.remove(elem)
solution without using a regex:不使用正则表达式的解决方案:
def remove_words_from_str(strn, word, replacement=' '):
return " ".join([replacement if token==word else token for token in strn.split()])
How about this?这个怎么样? You're
pattern
is just 'is'
, so you can substitute directly你的
pattern
只是'is'
,所以你可以直接替换
s = 'is this is a test ? yes this is ; this is'
rm = 'is'
re.sub(rm , '', s)
Output: ' th a test? yes th; th '
Output:
' th a test? yes th; th '
' th a test? yes th; th '
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.