简体   繁体   中英

Substitution regex with groupings

Given a string like \\url{www.mywebsite.com/home/us/index.html}' , I would like to replace the part of the URL up to the second-to-last forward slash with www.example.com/ , so that it becomes:

\url{www.example.com/us/index.html}`

I assume that at least one forward slash exists in the URL. Now this is what I tried.

>>> pattern = r'(\url{).*([^/]*/[^/]*})'
>>> prefix = r'\1www.example.com/\2'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com//index.html}'

I'm not sure why the us part is not included in the result, even though I explicitly included the [^/]* within the regex.

The greedy .* matches everything up to the last slash. Then your group just matches /index.html , with the first [^/]* matching nothing (because * can match nothing).

Include a slash after your .* to force the .* to stop before the second-to-last slash, preventing it from consuming the us that you want to leave for the group to capture:

>>> pattern = r'(\url{).*/([^/]*/[^/]*})'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com/us/index.html}'

Also using lookhead/lookbehind:

import re
# match anything that has a preceding '{' up to the last two slashes:
pattern = r'(?<={).*(?=(?:[^/]*/){2})'
prefix = r'www.example.com'
print re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')

Output

\url{www.example.com/us/index.html}

or without using regex at all:

l='\url{www.mywebsite.com/home/us/index.html}'.split(r"/")[-2:]
l=['\url{www.example.com', l[0], l[1]]
print "/".join(l)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM