正则表达式，多行字符串中两个模式之间的匹配

Question

我有一个多行字符串，并且我想要一个正则表达式从两个模式之间获取一些东西。 例如，在这里我试图匹配标题和日期之间的所有内容

例如：

s ="""\n#here's a title\n\nhello world!!!\n\nPosted on 11-09-2014 02:32:30"""
re.findall(r'#.+\n',s)[0][1:-1] # this grabs the title
Out: "here's a title"
re.findall(r'Posted on .+\n',s)[0][10:-1] #this grabs the date
Out: "11-09-2014 02:32:30"
re.findall(r'^[#\W+]',s) # try to grab everything after the title
Out: ['\n'] # but it only grabs until the end of line

Answer 1

>>> s = '''\n#here's a title\n\nhello world!!!\n\nPosted on 11-09-2014 02:32:30'''
>>> m1 = re.search(r'^#.+$', s, re.MULTILINE)
>>> m2 = re.search(r'^Posted on ', s, re.MULTILINE)
>>> m1.end()
16
>>> m2.start()
34
>>> s[m1.end():m2.start()]
'\n\nhello world!!!\n\n'

不要忘记检查m1和m2是否不是None 。

Answer 2

>>> re.findall(r'\n([^#].*)Posted', s, re.S)
['\nhello world!!!\n\n']

如果要避免换行符：

>>> re.findall(r'^([^#\n].*?)\n+Posted', s, re.S + re.M)
['hello world!!!']

Answer 3

您可以使用一个正则表达式匹配所有内容。

>>> s = '''\n#here's a title\n\nhello world!!!\n\nPosted on 11-09-2014 02:32:30'''
>>> re.search(r'#([^\n]+)\s+([^\n]+)\s+\D+([^\n]+)', s).groups()
("here's a title", 'hello world!!!', '11-09-2014 02:32:30')

Answer 4

您应该使用带括号的分组匹配：

    result = re.search(r'#[^\n]+\n+(.*)\n+Posted on .*', s, re.MULTILINE | re.DOTALL)
    result.group(1)

在这里，我使用了search ，但是如果同一字符串可能包含多个匹配项，您仍然可以使用findall 。

如果要捕获标题，内容和日期，则可以使用多个组：

    result = re.search(r'#([^\n]+)\n+(.*)\n+Posted on ([^\n]*)', s, re.MULTILINE | re.DOTALL)
    result.group(1) # The title
    result.group(2) # The contents
    result.group(3) # The date

在同一个正则表达式中捕获全部3个结果要比对每个部分使用一个正则表达式好得多，特别是如果您的多行字符串可能包含多个匹配项（在其中将各个findall结果“同步”在一起很容易导致错误的title-content-date组合）。

如果您打算大量使用此正则表达式，请考虑对其进行一次编译以提高性能：

    regex = re.compile(r'#([^\n]+)\n+(.*)\n+(Posted on [^\n]*)', re.MULTILINE | re.DOTALL)
    # ...
    result = regex.search(s)
    result = regex.search('another multiline string, ...')

Answer 5

将组匹配与非贪婪搜索（。*？）一起使用。 并给组起一个名称以便于查找。

>>> s = '\n#here\'s a title\n\nhello world!!!\n\nPosted on 11-09-2014 02:32:30'
>>> pattern = r'\s*#[\w \']+\n+(?P<content>.*?)\n+Posted on'
>>> a = re.match(pattern, s, re.M)
>>> a.group('content')
'hello world!!!'

正则表达式，多行字符串中两个模式之间的匹配

问题描述

5 个解决方案

解决方案1
1 已采纳 2014-09-10 22:57:21

解决方案2
1 2014-09-10 23:01:36

解决方案3
1 2014-09-10 23:05:26

解决方案4
0 2014-09-10 23:21:31

解决方案5
0 2014-09-10 23:27:23

正则表达式，多行字符串中两个模式之间的匹配

问题描述

5 个解决方案

解决方案1 1 已采纳 2014-09-10 22:57:21

解决方案2 1 2014-09-10 23:01:36

解决方案3 1 2014-09-10 23:05:26

解决方案4 0 2014-09-10 23:21:31

解决方案5 0 2014-09-10 23:27:23

解决方案1
1 已采纳 2014-09-10 22:57:21

解决方案2
1 2014-09-10 23:01:36

解决方案3
1 2014-09-10 23:05:26

解决方案4
0 2014-09-10 23:21:31

解决方案5
0 2014-09-10 23:27:23