Regular expression to match everything to the end of a string, excluding the last character if it's a slash (/)

Question

Currently, I have the following Python regex:

r'^https?://(www.)?domain.com/?(?P<path>.*)/?$'

That I'm replacing with:

r'/\g<path>/'

This works fine except in the scenario where the last character of the string is a slash (/). In that case, the .* greedily consumes the last / , so the subbed string ends up as /path//

Essentially, I'm stripping the domain from an absolute path, turning it into a relative path, and trying to ensure that the relative path both begins and ends with a / .

Any idea how I can exclude the last character from the match if and only if it's a / ? It seems I'll probably need some sort of look-ahead, but I'm not sure exactly how to construct it.

Answer 1

Don't use regular expressions for this, use the urlparse module instead.

Example from the docs:

>>> from urlparse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
>>> o
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            params='', query='', fragment='')
>>> o.scheme
'http'
>>> o.port
80
>>> o.geturl()
'http://www.cwi.nl:80/%7Eguido/Python.html'

Answer 2

Just make the asterisk lazy:

r'^https?://(www.)?domain.com/?(?P<path>.*?)/?$'

The $ at the end ensures that the entire string will be matched, and a trailing slash, if present, will always be matched by the /? .

Regular expression to match everything to the end of a string, excluding the last character if it's a slash (/)

Question

2 answers

solution1
4 2011-11-04 20:17:06

solution2
3 ACCPTED 2011-11-04 20:16:14

Regular expression to match everything to the end of a string, excluding the last character if it's a slash (/)

Question

2 answers

solution1 4 2011-11-04 20:17:06

solution2 3 ACCPTED 2011-11-04 20:16:14

solution1
4 2011-11-04 20:17:06

solution2
3 ACCPTED 2011-11-04 20:16:14