简体   繁体   English

如何从 python 中的字符串中删除整个单词?

[英]How to remove WHOLE words from a string in python?

I'm trying to make a function to remove whole words from a string in python, and I think I have something that does it:我正在尝试制作一个 function 来从 python 中的字符串中删除整个单词,我想我可以做到这一点:

def remove_words_from_str(strn, word, replacement=' '): 
    return re.sub(r'(\s*)'+word+'(\s*)', replacement, strn)

The problem is this takes pieces of words too, which I don't want.问题是这也需要一些单词,这是我不想要的。

EX:  print( remove_words_from_str( "is this is a test ? yes this is ; this is", "is" ) )
OUT:  th  a test ? yes th  ; th  

Is there a way to only take whole words?有没有办法只取整个单词? (In other words, I don't want 'this' to go to 'th', cause the 'is' in 'this' is not a full word) (换句话说,我不希望 go 到 'th' 的 'this',因为 'this' 中的 'is' 不是一个完整的单词)

Python regex supports a \b symbol, which means "word" boundary. Python 正则表达式支持\b符号,表示“单词”边界。 So you can do所以你可以做

re.sub(r'\s*\b' + word + r'\b\s*', replacement, strn)

You will still want to keep the greedy \s* quantifiers on both sides to replace all the surrounding spaces with a single space.您仍然希望在两侧保留贪婪\s*量词,以用单个空格替换所有周围的空格。

The output for your test case is您的测试用例的 output 是

' this a test ? yes this ; this '

If you want to ensure that the first and last space are removed, use str.strip on the result:如果要确保删除第一个和最后一个空格,请在结果上使用str.strip

def remove_words_from_str(strn, word, replacement=' '): 
    return re.sub(r'\s*\b' + word + r'\b\s*', replacement, strn).strip()

This worked for me.这对我有用。

def remove_words_from_str(strn, word, replacement=' '): 
    return re.sub(r'(^|\s+)'+word+'($|\s+)', replacement, strn)

You could use the .split() method on the list to break it down into single words (splits at blanks if no argument is given).您可以使用列表中的.split()方法将其分解为单个单词(如果没有给出参数,则在空格处拆分)。 And then simply go with然后只需 go 与

list.remove(elem)

solution without using a regex:不使用正则表达式的解决方案:

def remove_words_from_str(strn, word, replacement=' '): 
    return " ".join([replacement if token==word else token for token in strn.split()])

How about this?这个怎么样? You're pattern is just 'is' , so you can substitute directly你的pattern只是'is' ,所以你可以直接替换

s = 'is this is a test ? yes this is ; this is'
rm = 'is'
re.sub(rm , '', s)

Output: ' th a test? yes th; th ' Output: ' th a test? yes th; th ' ' th a test? yes th; th '

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM