简体   繁体   English

Python正则表达式匹配:## ##

[英]Python Regular Expression Matching: ## ##

I'm searching a file line by line for the occurrence of ##random_string##. 我正在逐行搜索文件中是否出现## random_string ##。 It works except for the case of multiple #... 它适用于多个#...的情况

pattern='##(.*?)##'
prog=re.compile(pattern)

string='lala ###hey## there'
result=prog.search(string)

print re.sub(result.group(1), 'FOUND', string)

Desired Output: 所需输出:

"lala #FOUND there"

Instead I get the following because its grabbing the whole ###hey##: 相反,我得到以下内容,因为它抓住了整个### hey ##:

"lala FOUND there"

So how would I ignore any number of # at the beginning or end, and only capture "##string##". 因此,我将如何忽略开头或结尾的任意数量的#,而仅捕获“ ## string ##”。

至少在两端匹配至少两个哈希:

pattern='##+(.*?)##+'

Your problem is with your inner match. 你的问题在于你的内心匹配。 You use . 您使用. , which matches any character that isn't a line end, and that means it matches # as well. ,它匹配不是行尾的任何字符,这意味着它也匹配# So when it gets ###hey## , it matches (.*?) to #hey . 因此,当它得到###hey## ,它将(.*?)#hey

The easy solution is to exclude the # character from the matchable set: 一个简单的解决方案是从可匹配的集合中排除#字符:

prog = re.compile(r'##([^#]*)##')

Protip: Use raw strings (eg r'' ) for regular expressions so you don't have to go crazy with backslash escapes. 提示:对正则表达式使用原始字符串(例如r'' ),这样就不必担心反斜杠转义。

Trying to allow # inside the hashes will make things much more complicated. 试图让#进入哈希值将使事情变得更加复杂。

EDIT: If you do not want to allow blank inner text (ie "####" shouldn't match with an inner text of ""), then change it to: 编辑:如果您不想允许内部文本为空白(即“ ####”不应与内部文本“”匹配),则将其更改为:

prog = re.compile(r'##([^#]+)##')

+ means "one or more." +表示“一个或多个”。

'^#{2,}([^#]*)#{2,}' -- any number of # >= 2 on either end '^#{2,}([^#]*)#{2,}' -两端任意数量的#> = 2

be careful with using lazy quantifiers like (.*?) because it'd match '##abc#####' and capture 'abc###'. 请谨慎使用(。*?)之类的惰性量词,因为它会匹配“ ## abc #####”并捕获“ abc ###”。 also lazy quantifiers are very slow 懒惰的量词也很慢

Try the "block comment trick": /##((?:[^#]|#[^#])+?)##/ 尝试“阻止注释技巧”: /##((?:[^#]|#[^#])+?)##/ 工作示例的屏幕截图

Adding + to regex, which means to match one or more character. 在正则表达式中添加+,表示匹配一个或多个字符。

pattern='#+(.*?)#+'
prog=re.compile(pattern)

string='###HEY##'
result=prog.search(string)
print result.group(1)

Output: 输出:

HEY

have you considered doing it non-regex way? 您是否考虑过采用非正则表达式的方式?

>>> string='lala ####hey## there'
>>> string.split("####")[1].split("#")[0]
'hey'
>>> import re
>>> text= 'lala ###hey## there'
>>> matcher= re.compile(r"##[^#]+##")
>>> print matcher.sub("FOUND", text)
lala #FOUND there
>>>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM