I'm trying to get the absolute path but I dont get the correct result. This is I'm trying:
Given I have this html page url:
url1 = 'build/en/index.html'
and I have this relative path in the file:
url2 = '/pub-assets/css/indexen.css'
I'm doing:
urljoin(url1, url2)
So I should get build/pub-assets/css/indexen.css
but I don't get what is expected. Any suggestion much appreciated.
If your url1
is a file (instead of directory), you should modify the path by using urlparse
and ParseResult._replace
to modify the result.
from urlparse import urlsplit
url1 = 'https://example.com/en/index.html'
url2 = 'pub-assets/css/indexen.css'
p = urlsplit(url1).path
new_path = p[:p.rfind('/') + 1] + url2 #Gets the last directory and appends url
joined = urlsplit(url1)._replace(path=new_path)
print joined.geturl() #Outputs https://example.com/en/pub-assets/css/indexen.css
This is assuming that url1
is an absolute path and url2
is a relative path.
Python 3.6.1:
>>> u1 = 'https://example.com/en/index.html'
>>> u2 = 'pub-assets/css/indexen.css'
>>> import urllib.parse
>>> urllib.parse.urljoin(u1, u2)
'https://example.com/en/pub-assets/css/indexen.css'
Python 2.7.14:
>>> u1 = 'https://example.com/en/index.html'
>>> u2 = 'pub-assets/css/indexen.css'
>>> import urlparse
>>> urlparse.urljoin(u1, u2)
'https://example.com/en/pub-assets/css/indexen.css'
Note the changed import. I would double-check your Python version, import statement, and perhaps post more of your program.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.