I have a multiline string, and I want a regular expression to grab some stuff from in between two patterns. For example, here I am trying to match everything between the title and date
For example:
s ="""\n#here's a title\n\nhello world!!!\n\nPosted on 11-09-2014 02:32:30"""
re.findall(r'#.+\n',s)[0][1:-1] # this grabs the title
Out: "here's a title"
re.findall(r'Posted on .+\n',s)[0][10:-1] #this grabs the date
Out: "11-09-2014 02:32:30"
re.findall(r'^[#\W+]',s) # try to grab everything after the title
Out: ['\n'] # but it only grabs until the end of line
>>> s = '''\n#here's a title\n\nhello world!!!\n\nPosted on 11-09-2014 02:32:30'''
>>> m1 = re.search(r'^#.+$', s, re.MULTILINE)
>>> m2 = re.search(r'^Posted on ', s, re.MULTILINE)
>>> m1.end()
16
>>> m2.start()
34
>>> s[m1.end():m2.start()]
'\n\nhello world!!!\n\n'
Don't forget to check that m1
and m2
are not None
.
>>> re.findall(r'\n([^#].*)Posted', s, re.S)
['\nhello world!!!\n\n']
If you want to avoid the newlines:
>>> re.findall(r'^([^#\n].*?)\n+Posted', s, re.S + re.M)
['hello world!!!']
You could match all using one regular expression.
>>> s = '''\n#here's a title\n\nhello world!!!\n\nPosted on 11-09-2014 02:32:30'''
>>> re.search(r'#([^\n]+)\s+([^\n]+)\s+\D+([^\n]+)', s).groups()
("here's a title", 'hello world!!!', '11-09-2014 02:32:30')
You should use a group match using parenthesis:
result = re.search(r'#[^\n]+\n+(.*)\n+Posted on .*', s, re.MULTILINE | re.DOTALL)
result.group(1)
Here I've used search
, but you can still use findall
if the same string may contain multiple matches...
If you want to capture the title, the content and the date, you can use multiple groups:
result = re.search(r'#([^\n]+)\n+(.*)\n+Posted on ([^\n]*)', s, re.MULTILINE | re.DOTALL)
result.group(1) # The title
result.group(2) # The contents
result.group(3) # The date
Catching all 3 in the same regex is much better than using one for each part, specially if your multiline string may contain multiple matches (where 'syncing' your individual findall
results together could easily lead to wrong title-content-date combinations).
If you are going to use this regex a lot, consider compiling it once for performance:
regex = re.compile(r'#([^\n]+)\n+(.*)\n+(Posted on [^\n]*)', re.MULTILINE | re.DOTALL)
# ...
result = regex.search(s)
result = regex.search('another multiline string, ...')
Use group match with non-greedy search (.*?). And give the group a name for easier lookup.
>>> s = '\n#here\'s a title\n\nhello world!!!\n\nPosted on 11-09-2014 02:32:30'
>>> pattern = r'\s*#[\w \']+\n+(?P<content>.*?)\n+Posted on'
>>> a = re.match(pattern, s, re.M)
>>> a.group('content')
'hello world!!!'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.