简体   繁体   中英

python urljoin not finding the absolute path

I'm trying to get the absolute path but I dont get the correct result. This is I'm trying:

Given I have this html page url:

url1 = 'build/en/index.html'

and I have this relative path in the file:

url2  = '/pub-assets/css/indexen.css'

I'm doing:

urljoin(url1, url2)

So I should get build/pub-assets/css/indexen.css

but I don't get what is expected. Any suggestion much appreciated.

If your url1 is a file (instead of directory), you should modify the path by using urlparse and ParseResult._replace to modify the result.

from urlparse import urlsplit

url1 = 'https://example.com/en/index.html'
url2  = 'pub-assets/css/indexen.css'

p = urlsplit(url1).path
new_path = p[:p.rfind('/') + 1] + url2    #Gets the last directory and appends url
joined = urlsplit(url1)._replace(path=new_path)
print joined.geturl()  #Outputs https://example.com/en/pub-assets/css/indexen.css

This is assuming that url1 is an absolute path and url2 is a relative path.

Python 3.6.1:

>>> u1 = 'https://example.com/en/index.html'
>>> u2 = 'pub-assets/css/indexen.css'
>>> import urllib.parse
>>> urllib.parse.urljoin(u1, u2)
'https://example.com/en/pub-assets/css/indexen.css'

Python 2.7.14:

>>> u1 = 'https://example.com/en/index.html'
>>> u2 = 'pub-assets/css/indexen.css'
>>> import urlparse
>>> urlparse.urljoin(u1, u2)
'https://example.com/en/pub-assets/css/indexen.css'

Note the changed import. I would double-check your Python version, import statement, and perhaps post more of your program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM