Given a string like \\url{www.mywebsite.com/home/us/index.html}'
, I would like to replace the part of the URL up to the second-to-last forward slash with www.example.com/
, so that it becomes:
\url{www.example.com/us/index.html}`
I assume that at least one forward slash exists in the URL. Now this is what I tried.
>>> pattern = r'(\url{).*([^/]*/[^/]*})'
>>> prefix = r'\1www.example.com/\2'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com//index.html}'
I'm not sure why the us
part is not included in the result, even though I explicitly included the [^/]*
within the regex.
The greedy .*
matches everything up to the last slash. Then your group just matches /index.html
, with the first [^/]*
matching nothing (because *
can match nothing).
Include a slash after your .*
to force the .*
to stop before the second-to-last slash, preventing it from consuming the us
that you want to leave for the group to capture:
>>> pattern = r'(\url{).*/([^/]*/[^/]*})'
>>> re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
'\\url{www.example.com/us/index.html}'
Also using lookhead/lookbehind:
import re
# match anything that has a preceding '{' up to the last two slashes:
pattern = r'(?<={).*(?=(?:[^/]*/){2})'
prefix = r'www.example.com'
print re.sub(pattern, prefix, '\url{www.mywebsite.com/home/us/index.html}')
Output
\url{www.example.com/us/index.html}
or without using regex at all:
l='\url{www.mywebsite.com/home/us/index.html}'.split(r"/")[-2:]
l=['\url{www.example.com', l[0], l[1]]
print "/".join(l)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.