简体   繁体   English

Python Regex排除某些前缀

[英]Python Regex exclude certain prefix

Given the following string 给定以下字符串

s = '"foo" "bar2baz_foo" foo( bar2baz_foo( p_foo p_foo.'

I need a regex such that 我需要这样的正则表达式

re.findall(regex, s)

gives

['foo', 'bar2baz_foo', 'foo', 'bar2baz_foo']

So it matches the first four "words" excluding the quotes and parentheses but not the last two. 因此,它与前四个“单词”匹配,但不包括引号和括号,但不匹配后两个。 I have tried a couple different things but nothing I can come up with actually works. 我尝试了几种不同的方法,但实际上我无法解决。

Hope someone here can help. 希望这里有人能帮忙。

Edit: I should add that I want to replace the results with something else and not just find it, ie I wanna use re.sub and not re.findall . 编辑:我应该补充一点,我想用其他东西代替结果,而不仅仅是找到它,即我想使用re.sub而不是re.findall And also the string is the content of a text file in reality and therefore much longer. 而且字符串实际上是文本文件的内容,因此更长。 I just extracted the relevant bits. 我只是提取了相关的位。

If you're not hell-bent on a pure regex solution, you could use The Greatest Regex Trick Ever . 如果您对纯正则表达式解决方案不满意,可以使用The Greatest Regex Trick Ever

>>> s = '"foo" "bar2baz_foo" foo( bar2baz_foo( p_foo p_foo.'
>>> import re
>>> filter(None, re.findall(r'p_\w*|(\w+)', s))
['foo', 'bar2baz_foo', 'foo', 'bar2baz_foo']

Small demo for usage in re.sub : 用于re.sub小演示:

>>> re.sub(r'p_\w*|(\w+)', lambda m: 'WORD' if m.group(1) else m.group(), s)
'"WORD" "WORD" WORD( WORD( p_foo p_foo.'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM