简体   繁体   中英

regex match whole line instead of between the tag

I am new to regex and just testing it out, my problem is after looking at examples my regex is matching the whole line almost instead of in between the tag.

re.findall(r'<i>(.*)</i>', 'test <i>abc</i> <i>def</i>')

['abc</i> <i>def']

Why is it not matching just between the tags given me abc def

You are using .* which is greedy. You want to add ? to the end of that making it non greedy.

>>> re.findall(r'<i>(.*?)</i>', 'test <i>abc</i> <i>def</i>')
['abc', 'def']

From the re documentation:

The * , + , and ? qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE <.> is matched against '<H1>title</H1>' , it will match the entire string, and not just ''. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .? in the previous expression will match only ''.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM