Substitution regex with groupings

Question

Given a string like \\url{www.mywebsite.com/home/us/index.html}' , I would like to replace the part of the URL up to the second-to-last forward slash with www.example.com/ , so that it becomes:

\url{www.example.com/us/index.html}`

I assume that at least one forward slash exists in the URL. Now this is what I tried.

>>> pattern = r'(\url{).*([^/]*/[^/]*})'
>>> prefix = r'\1www.example.com/\2'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com//index.html}'

I'm not sure why the us part is not included in the result, even though I explicitly included the [^/]* within the regex.

Answer 1

The greedy .* matches everything up to the last slash. Then your group just matches /index.html , with the first [^/]* matching nothing (because * can match nothing).

Include a slash after your .* to force the .* to stop before the second-to-last slash, preventing it from consuming the us that you want to leave for the group to capture:

>>> pattern = r'(\url{).*/([^/]*/[^/]*})'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com/us/index.html}'

Answer 2

Also using lookhead/lookbehind:

import re
# match anything that has a preceding '{' up to the last two slashes:
pattern = r'(?<={).*(?=(?:[^/]*/){2})'
prefix = r'www.example.com'
print re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')

Output

\url{www.example.com/us/index.html}

or without using regex at all:

l='\url{www.mywebsite.com/home/us/index.html}'.split(r"/")[-2:]
l=['\url{www.example.com', l[0], l[1]]
print "/".join(l)

Substitution regex with groupings

Question

2 answers

solution1
1 ACCPTED 2013-05-29 00:54:37

solution2
1 2013-05-29 01:00:23

Substitution regex with groupings

Question

2 answers

solution1 1 ACCPTED 2013-05-29 00:54:37

solution2 1 2013-05-29 01:00:23

solution1
1 ACCPTED 2013-05-29 00:54:37

solution2
1 2013-05-29 01:00:23