正则表达式+ Python：如何用'？'查找字符串在里面？

Question

I have a multi-line string in content variable, and I need to retreive all matches for a pattern uri containing question mark in it. 我在content变量中有一个多行字符串，我需要检索包含问号的模式uri所有匹配项。

This is what I have so far: 这是我到目前为止：

content = """
/blog:text:Lorem ipsum dolor sit amet, consectetur adipisicing elit
<break>
text:Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
<break>
text:Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia.

/blog?page=1:text:Lorem ipsum dolor sit amet, consectetur adipisicing elit
<break>
text:Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
<break>
text:Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia.
"""

#uri = '/blog' # Works fine
uri = '/blog?page=1'
re.findall('^(?ism)%s?:(.*?)(\n\n)' % uri, content)

It works fine until uri gets ? 它工作正常，直到uri得到? with parameters after it, and I get empty list. 用它后的参数，我得到空列表。

Any ideas how to fix the regex? 任何想法如何修复正则表达式？

Answer 1

Python's re.escape() is your friend. Python的re.escape()是你的朋友。 If you don't use it, the ? 如果你不使用它， ? inside the uri is treated with its usual meaning inside of a regular expression (making the prior item a 0-or-1 match). 在uri内部使用其在正则表达式中的通常含义（使前一项为0或1匹配）。

uri = '/blog?page=1'
re.findall('^(?ism)%s?:(.*?)(\n\n)' % re.escape(uri), content)

I'm not clear exactly what you want the ?: after the the %s to do, so I'm leaving it in on the potentially-faulty presumption that it's there for a reason. 我不清楚你究竟想要什么?:在%s之后做什么，所以我将它留在可能错误的假设中，因为它存在于某个原因。

Answer 2

I'd keep it simple and find possible matches, then filter out those containing a ? 我会保持简单并找到可能的匹配，然后过滤掉那些包含? , eg: ，例如：

import re

candidates = (m.group(1) for m in re.finditer('^(.*?):', content, flags=re.M))
matches = [m for m in candidates if '?' in m]
# ['/blog?page=1']

Answer 3

I didn't see two newlines in your content . 我没有看到你的两个新行content 。 Also, I have escaped the ? 还有，我逃过了? from uri as it's regex character. 来自uri，因为它是正则表达式的角色。

uri = '/blog\?page=1'
re.findall('^(?ism)%s?:(.*?)[\n\r]' % uri, content)

正则表达式+ Python：如何用'？'查找字符串在里面？

问题描述

3 个解决方案

解决方案1
1 已采纳 2014-03-10 17:51:03

解决方案2
1 2014-03-10 17:55:06

解决方案3
0 2014-03-10 17:50:55

正则表达式+ Python：如何用&#39;？&#39;查找字符串 在里面？

问题描述

3 个解决方案

解决方案1 1 已采纳 2014-03-10 17:51:03

解决方案2 1 2014-03-10 17:55:06

解决方案3 0 2014-03-10 17:50:55

正则表达式+ Python：如何用'？'查找字符串在里面？

解决方案1
1 已采纳 2014-03-10 17:51:03

解决方案2
1 2014-03-10 17:55:06

解决方案3
0 2014-03-10 17:50:55