[英]Python Regex exclude certain prefix
Given the following string 给定以下字符串
s = '"foo" "bar2baz_foo" foo( bar2baz_foo( p_foo p_foo.'
I need a regex such that 我需要这样的正则表达式
re.findall(regex, s)
gives 给
['foo', 'bar2baz_foo', 'foo', 'bar2baz_foo']
So it matches the first four "words" excluding the quotes and parentheses but not the last two. 因此,它与前四个“单词”匹配,但不包括引号和括号,但不匹配后两个。 I have tried a couple different things but nothing I can come up with actually works.
我尝试了几种不同的方法,但实际上我无法解决。
Hope someone here can help. 希望这里有人能帮忙。
Edit: I should add that I want to replace the results with something else and not just find it, ie I wanna use re.sub
and not re.findall
. 编辑:我应该补充一点,我想用其他东西代替结果,而不仅仅是找到它,即我想使用
re.sub
而不是re.findall
。 And also the string is the content of a text file in reality and therefore much longer. 而且字符串实际上是文本文件的内容,因此更长。 I just extracted the relevant bits.
我只是提取了相关的位。
If you're not hell-bent on a pure regex solution, you could use The Greatest Regex Trick Ever . 如果您对纯正则表达式解决方案不满意,可以使用The Greatest Regex Trick Ever 。
>>> s = '"foo" "bar2baz_foo" foo( bar2baz_foo( p_foo p_foo.'
>>> import re
>>> filter(None, re.findall(r'p_\w*|(\w+)', s))
['foo', 'bar2baz_foo', 'foo', 'bar2baz_foo']
Small demo for usage in re.sub
: 用于
re.sub
小演示:
>>> re.sub(r'p_\w*|(\w+)', lambda m: 'WORD' if m.group(1) else m.group(), s)
'"WORD" "WORD" WORD( WORD( p_foo p_foo.'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.