[英]regex + Python: How to find string with '?' in it?
I have a multi-line string in content
variable, and I need to retreive all matches for a pattern uri
containing question mark in it. 我在
content
变量中有一个多行字符串,我需要检索包含问号的模式uri
所有匹配项。
This is what I have so far: 这是我到目前为止:
content = """
/blog:text:Lorem ipsum dolor sit amet, consectetur adipisicing elit
<break>
text:Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
<break>
text:Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia.
/blog?page=1:text:Lorem ipsum dolor sit amet, consectetur adipisicing elit
<break>
text:Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
<break>
text:Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia.
"""
#uri = '/blog' # Works fine
uri = '/blog?page=1'
re.findall('^(?ism)%s?:(.*?)(\n\n)' % uri, content)
It works fine until uri
gets ?
它工作正常,直到
uri
得到?
with parameters after it, and I get empty list. 用它后的参数,我得到空列表。
Any ideas how to fix the regex? 任何想法如何修复正则表达式?
Python's re.escape()
is your friend. Python的
re.escape()
是你的朋友。 If you don't use it, the ?
如果你不使用它,
?
inside the uri is treated with its usual meaning inside of a regular expression (making the prior item a 0-or-1 match). 在uri内部使用其在正则表达式中的通常含义(使前一项为0或1匹配)。
uri = '/blog?page=1'
re.findall('^(?ism)%s?:(.*?)(\n\n)' % re.escape(uri), content)
I'm not clear exactly what you want the ?:
after the the %s
to do, so I'm leaving it in on the potentially-faulty presumption that it's there for a reason. 我不清楚你究竟想要什么
?:
在%s
之后做什么,所以我将它留在可能错误的假设中,因为它存在于某个原因。
I'd keep it simple and find possible matches, then filter out those containing a ?
我会保持简单并找到可能的匹配,然后过滤掉那些包含
?
, eg: ,例如:
import re
candidates = (m.group(1) for m in re.finditer('^(.*?):', content, flags=re.M))
matches = [m for m in candidates if '?' in m]
# ['/blog?page=1']
I didn't see two newlines in your content
. 我没有看到你的两个新行
content
。 Also, I have escaped the ?
还有,我逃过了
?
from uri as it's regex character. 来自uri,因为它是正则表达式的角色。
uri = '/blog\?page=1'
re.findall('^(?ism)%s?:(.*?)[\n\r]' % uri, content)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.