I am new to regex and just testing it out, my problem is after looking at examples my regex is matching the whole line almost instead of in between the tag.
re.findall(r'<i>(.*)</i>', 'test <i>abc</i> <i>def</i>')
['abc</i> <i>def']
Why is it not matching just between the tags given me abc
def
You are using .*
which is greedy. You want to add ?
to the end of that making it non greedy.
>>> re.findall(r'<i>(.*?)</i>', 'test <i>abc</i> <i>def</i>')
['abc', 'def']
From the re
documentation:
The
*
,+
, and?
qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE<.>
is matched against'<H1>title</H1>'
, it will match the entire string, and not just ''. Adding?
after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using.?
in the previous expression will match only ''.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.