使用Regex完全匹配Python中的多行

Question

I am trying to extract the content that spans over multi lines. 我正在尝试提取跨越多行的内容。 The content looks like this: 内容如下所示：

some content here
[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
 *something here
 *another something here
not relevant, should not be returned
[1/3/2015 - SSR] another one

There is always a space before the * *之前总是有一个空格

The code I am using is: 我使用的代码是：

re.search(r'.*- SSR](.*)',line,re.DOTALL)

The expected output is: 预期的输出是：

[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
 *something here
 *another something here
[1/3/2015 - SSR] another one

However it only retrieve the first and the third record, not the second one. 但是，它仅检索第一条和第三条记录，而不检索第二条。 Since it ranges multilines. 由于它范围多行。 Can anybody help? 有人可以帮忙吗？ I would really appreciate it. 我真的很感激。

Answer 1

You can use a regex like this: 您可以使用以下正则表达式：

^.*?- SSR]([^[]*)

Working demo 工作演示

在此处输入图片说明

Match information: 比赛信息：

MATCH 1
1.  [34-45] ` something
`
MATCH 2
1.  [61-111]    ` another:
*something here
*another something here
`
MATCH 3
1.  [127-139]   ` another one`

You can use something like this: 您可以使用如下形式：

import re
p = re.compile(ur'^\[.*?- SSR]([^[]*)', re.DOTALL | re.MULTILINE)
test_str = u"some content here\n[1/1/2015 - SSR] something\n[1/2/2015 - SSR] another:\n*something here\n*another something here\n[1/3/2015 - SSR] another one"

re.findall(p, test_str)

On the other hand, if you want to also capture the beginning of the string in the group, then you can use this regex: 另一方面，如果您还想捕获组中字符串的开头，则可以使用此正则表达式：

^(\[.*?- SSR][^[]*)

Working demo 工作演示

Match information: 比赛信息：

MATCH 1
1.  [18-45] `[1/1/2015 - SSR] something
`
MATCH 2
1.  [45-111]    `[1/2/2015 - SSR] another:
*something here
*another something here
`
MATCH 3
1.  [111-139]   `[1/3/2015 - SSR] another one`

Answer 2

Assuming the text can contain angle brackets, you can use the entire preamble with non-capturing lookaheads to get the content. 假设文本可以包含尖括号，则可以将整个前导与不带前瞻性的超前使用以获取内容。 The \\Z towards the end is needed for the last record. 最后一条记录需要结尾处的\\Z

import re

s = """[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
*something here
*another something here
[1/3/2015 - SSR] another one"""

print 'string to process'
print s
print
print 'matches'
matches = re.findall(
    r'\[\d+/\d+/\d+ - SSR\].*?(?:(?=\[\d+/\d+/\d+ - SSR\])|\Z)', 
    s, re.MULTILINE|re.DOTALL)
for i, match in enumerate(matches, 1):
    print "%d: %s" % (i, match.strip())

The output is 输出是

string to process
[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
*something here
*another something here
[1/3/2015 - SSR] another one

matches
1: [1/1/2015 - SSR] something
2: [1/2/2015 - SSR] another:
*something here
*another something here
3: [1/3/2015 - SSR] another one

使用Regex完全匹配Python中的多行

问题描述

2 个解决方案

解决方案1
0 2015-03-13 19:32:05

解决方案2
0 2015-03-13 20:00:40

使用Regex完全匹配Python中的多行

问题描述

2 个解决方案

解决方案1 0 2015-03-13 19:32:05

解决方案2 0 2015-03-13 20:00:40

解决方案1
0 2015-03-13 19:32:05

解决方案2
0 2015-03-13 20:00:40