简体   繁体   中英

Regular expression to match everything to the end of a string, excluding the last character if it's a slash (/)

Currently, I have the following Python regex:

r'^https?://(www.)?domain.com/?(?P<path>.*)/?$'

That I'm replacing with:

r'/\g<path>/'

This works fine except in the scenario where the last character of the string is a slash (/). In that case, the .* greedily consumes the last / , so the subbed string ends up as /path//

Essentially, I'm stripping the domain from an absolute path, turning it into a relative path, and trying to ensure that the relative path both begins and ends with a / .

Any idea how I can exclude the last character from the match if and only if it's a / ? It seems I'll probably need some sort of look-ahead, but I'm not sure exactly how to construct it.

Don't use regular expressions for this, use the urlparse module instead.

Example from the docs:

>>> from urlparse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
>>> o
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            params='', query='', fragment='')
>>> o.scheme
'http'
>>> o.port
80
>>> o.geturl()
'http://www.cwi.nl:80/%7Eguido/Python.html'

Just make the asterisk lazy:

r'^https?://(www.)?domain.com/?(?P<path>.*?)/?$'

The $ at the end ensures that the entire string will be matched, and a trailing slash, if present, will always be matched by the /? .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM