简体   繁体   中英

Improve regular expression to end on quotation

I have the following regular expression:

>>> re.findall('http://www.rottentomatoes.com/.+', html)
['http://www.rottentomatoes.com/m/1129132-torque" class="see-all">Read More About This Movie On Rotten Tomatoes</a>']

How would I get this to match up until the " . I am trying to get the return to be:

http://www.rottentomatoes.com/m/1129132-torque

Use a non-greedy quantifier ? to stop at the first " :

>>> html = 'http://www.rottentomatoes.com/m/1129132-torque" class="see-all">Read More About This Movie On Rotten Tomatoes</a>'
>>> re.search('(http://www\.rottentomatoes\.com/.+?)"', html).group(1)
'http://www.rottentomatoes.com/m/1129132-torque'

Just add the character(") where you want to stop. Also add ? , so that it stops at the first match.

>>> html='http://www.rottentomatoes.com/m/1129132-torque" class="see-all">Read More About This Movie On Rotten Tomatoes</a>'
>>> re.findall('http://www.rottentomatoes.com/.+?\"', html)
['http://www.rottentomatoes.com/m/1129132-torque"']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM