[英]Python Regular Expression Matching: ## ##
I'm searching a file line by line for the occurrence of ##random_string##. 我正在逐行搜索文件中是否出现## random_string ##。 It works except for the case of multiple #... 它适用于多个#...的情况
pattern='##(.*?)##'
prog=re.compile(pattern)
string='lala ###hey## there'
result=prog.search(string)
print re.sub(result.group(1), 'FOUND', string)
Desired Output: 所需输出:
"lala #FOUND there"
Instead I get the following because its grabbing the whole ###hey##: 相反,我得到以下内容,因为它抓住了整个### hey ##:
"lala FOUND there"
So how would I ignore any number of # at the beginning or end, and only capture "##string##". 因此,我将如何忽略开头或结尾的任意数量的#,而仅捕获“ ## string ##”。
至少在两端匹配至少两个哈希:
pattern='##+(.*?)##+'
Your problem is with your inner match. 你的问题在于你的内心匹配。 You use .
您使用.
, which matches any character that isn't a line end, and that means it matches #
as well. ,它匹配不是行尾的任何字符,这意味着它也匹配#
。 So when it gets ###hey##
, it matches (.*?)
to #hey
. 因此,当它得到###hey##
,它将(.*?)
与#hey
。
The easy solution is to exclude the #
character from the matchable set: 一个简单的解决方案是从可匹配的集合中排除#
字符:
prog = re.compile(r'##([^#]*)##')
Protip: Use raw strings (eg r''
) for regular expressions so you don't have to go crazy with backslash escapes. 提示:对正则表达式使用原始字符串(例如r''
),这样就不必担心反斜杠转义。
Trying to allow #
inside the hashes will make things much more complicated. 试图让#
进入哈希值将使事情变得更加复杂。
EDIT: If you do not want to allow blank inner text (ie "####" shouldn't match with an inner text of ""), then change it to: 编辑:如果您不想允许内部文本为空白(即“ ####”不应与内部文本“”匹配),则将其更改为:
prog = re.compile(r'##([^#]+)##')
+
means "one or more." +
表示“一个或多个”。
'^#{2,}([^#]*)#{2,}'
-- any number of # >= 2 on either end '^#{2,}([^#]*)#{2,}'
-两端任意数量的#> = 2
be careful with using lazy quantifiers like (.*?) because it'd match '##abc#####' and capture 'abc###'. 请谨慎使用(。*?)之类的惰性量词,因为它会匹配“ ## abc #####”并捕获“ abc ###”。 also lazy quantifiers are very slow 懒惰的量词也很慢
Try the "block comment trick": /##((?:[^#]|#[^#])+?)##/
尝试“阻止注释技巧”: /##((?:[^#]|#[^#])+?)##/
Adding + to regex, which means to match one or more character. 在正则表达式中添加+,表示匹配一个或多个字符。
pattern='#+(.*?)#+'
prog=re.compile(pattern)
string='###HEY##'
result=prog.search(string)
print result.group(1)
Output: 输出:
HEY
have you considered doing it non-regex way? 您是否考虑过采用非正则表达式的方式?
>>> string='lala ####hey## there'
>>> string.split("####")[1].split("#")[0]
'hey'
>>> import re
>>> text= 'lala ###hey## there'
>>> matcher= re.compile(r"##[^#]+##")
>>> print matcher.sub("FOUND", text)
lala #FOUND there
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.