Currently, I have the following Python regex:
r'^https?://(www.)?domain.com/?(?P<path>.*)/?$'
That I'm replacing with:
r'/\g<path>/'
This works fine except in the scenario where the last character of the string is a slash (/). In that case, the .*
greedily consumes the last /
, so the subbed string ends up as /path//
Essentially, I'm stripping the domain from an absolute path, turning it into a relative path, and trying to ensure that the relative path both begins and ends with a /
.
Any idea how I can exclude the last character from the match if and only if it's a /
? It seems I'll probably need some sort of look-ahead, but I'm not sure exactly how to construct it.
Don't use regular expressions for this, use the urlparse
module instead.
Example from the docs:
>>> from urlparse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
>>> o
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
params='', query='', fragment='')
>>> o.scheme
'http'
>>> o.port
80
>>> o.geturl()
'http://www.cwi.nl:80/%7Eguido/Python.html'
Just make the asterisk lazy:
r'^https?://(www.)?domain.com/?(?P<path>.*?)/?$'
The $
at the end ensures that the entire string will be matched, and a trailing slash, if present, will always be matched by the /?
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.