简体   繁体   English

python re.findall 和 re.sub

[英]python re.findall and re.sub

My code:我的代码:

import re
print(re.findall(r'(?=(Deportivo))(?!.*\bla\b)','Deportivo coruna'))
print(re.sub(r'(?=(Deportivo))(?!.*\bla\b)','','Deportivo coruna'))

I am interested in removing 'Deportivo' if no 'la' in string.如果字符串中没有 'la',我有兴趣删除 'Deportivo'。

for instance:例如:

re.findall(r'(?=(Deportivo))(?!.*\bla\b)','Deportivo coruna')

returns ['Deportivo'] and返回 ['拉科鲁尼亚'] 和

re.findall(r'(?=(Deportivo))(?!.*\bla\b)','Deportivo la coruna')

returns []返回 []

however,然而,

re.sub(r'(?=(Deportivo))(?!.*\bla\b)','','Deportivo coruna')

returns 'Deportivo coruna', the string is unchanged.返回 '​​Deportivo coruna',字符串不变。 I am confused why, please help.我很困惑为什么,请帮助。

There is a difference in the way findall and sub work. findallsub工作方式有所不同。 According to the docs , re.findall() will return the contents of capturing groups, even if the match result itself is the empty string (which it is in your case, since the regex consists entirely of lookahead assertions).根据docsre.findall()将返回捕获组的内容,即使匹配结果本身是空字符串(在您的情况下也是如此,因为正则表达式完全由前瞻断言组成)。

So if you want to remove Deportivo from your text if and only if it doesn't also contain la , you could use因此,如果您想从文本中删除Deportivo当且仅当它不包含la ,您可以使用

re.sub(r'^(?!.*\bla\b)(.*?)Deportivo)',r'\1','Deportivo coruna')

However, that will only remove the first occurrence, and it's not trivial to change that because you would need unlimited repetition in lookbehind assertions, which Python doesn't support.但是,这只会删除第一次出现,并且更改它并不容易,因为您需要在后视断言中无限重复,而 Python 不支持。 For the record,作为记录,

re.sub(r'^(?<!\bla\b.*)Deportivo(?!.*\bla\b)','','Deportivo coruna')

would do the trick, but that regex won't compile in Python.会做到这一点,但该正则表达式不会在 Python 中编译。

So your best bet probably is to do it in two steps.所以你最好的办法可能是分两步完成。 First, check that your string doesn't contain la .首先,检查您的字符串是否不包含la Then replace all Deportivo s with the empty string.然后将所有Deportivo替换为空字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM