I am trying to extract the content that spans over multi lines. The content looks like this:
some content here
[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
*something here
*another something here
not relevant, should not be returned
[1/3/2015 - SSR] another one
There is always a space before the *
The code I am using is:
re.search(r'.*- SSR](.*)',line,re.DOTALL)
The expected output is:
[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
*something here
*another something here
[1/3/2015 - SSR] another one
However it only retrieve the first and the third record, not the second one. Since it ranges multilines. Can anybody help? I would really appreciate it.
You can use a regex like this:
^.*?- SSR]([^[]*)
Match information:
MATCH 1
1. [34-45] ` something
`
MATCH 2
1. [61-111] ` another:
*something here
*another something here
`
MATCH 3
1. [127-139] ` another one`
You can use something like this:
import re
p = re.compile(ur'^\[.*?- SSR]([^[]*)', re.DOTALL | re.MULTILINE)
test_str = u"some content here\n[1/1/2015 - SSR] something\n[1/2/2015 - SSR] another:\n*something here\n*another something here\n[1/3/2015 - SSR] another one"
re.findall(p, test_str)
On the other hand, if you want to also capture the beginning of the string in the group, then you can use this regex:
^(\[.*?- SSR][^[]*)
Match information:
MATCH 1
1. [18-45] `[1/1/2015 - SSR] something
`
MATCH 2
1. [45-111] `[1/2/2015 - SSR] another:
*something here
*another something here
`
MATCH 3
1. [111-139] `[1/3/2015 - SSR] another one`
Assuming the text can contain angle brackets, you can use the entire preamble with non-capturing lookaheads to get the content. The \\Z
towards the end is needed for the last record.
import re
s = """[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
*something here
*another something here
[1/3/2015 - SSR] another one"""
print 'string to process'
print s
print
print 'matches'
matches = re.findall(
r'\[\d+/\d+/\d+ - SSR\].*?(?:(?=\[\d+/\d+/\d+ - SSR\])|\Z)',
s, re.MULTILINE|re.DOTALL)
for i, match in enumerate(matches, 1):
print "%d: %s" % (i, match.strip())
The output is
string to process
[1/1/2015 - SSR] something
[1/2/2015 - SSR] another:
*something here
*another something here
[1/3/2015 - SSR] another one
matches
1: [1/1/2015 - SSR] something
2: [1/2/2015 - SSR] another:
*something here
*another something here
3: [1/3/2015 - SSR] another one
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.