简体   繁体   中英

Python urlparser Gives wrong result

I'm trying to separate the different parts of a url with python s urlparse, but I'm seeming to get the wrong values in the results.

baseline = runSql(conn,"Select url from malware_traffic where tag = 'baseline';")

for i in baseline:
    print i[0]
    print urlparse.urlparse(i[0])

the runSql function just returns a list of urls. I loop through them and attempt to turn the urls from the baseline variable into urls, but the way python parses the urls seems to be incorrect

172.217.9.174:443/c2dm/register3
ParseResult(scheme='172.217.9.174', netloc='', path='443/c2dm/register3', params='', query='', fragment='')
connectivitycheck.gstatic.com:80/generate_204
ParseResult(scheme='connectivitycheck.gstatic.com', netloc='', path='80/generate_204', params='', query='', fragment='')
www.google.com:80/gen_204
ParseResult(scheme='www.google.com', netloc='', path='80/gen_204', params='', query='', fragment='')
172.217.9.174:443/auth/devicekey
ParseResult(scheme='172.217.9.174', netloc='', path='443/auth/devicekey', params='', query='', fragment='')

In the results you can clearly see that it is mixing up scheme and netloc as well as including the port in path.

For instance the first result should be this.

ParseResult(scheme='', netloc='172.217.9.174:443', path='/c2dm/register3', params='', query='', fragment='')

not sure why it's getting messed up.

I'm practically using the same thing as one of the examples in the documentation here https://docs.python.org/2/library/urlparse.html .

So what am I doing wrong or is it a bug?

The problem is that your urls don't have a scheme (the http:// part), so python thinks 172.217.9.174: is the scheme. Prefixed with http:// everything works as expected:

>>> urlparse('172.217.9.174:443/c2dm/register3')
ParseResult(scheme='172.217.9.174', netloc='', path='443/c2dm/register3', params='', query='', fragment='')
>>> urlparse('http://172.217.9.174:443/c2dm/register3')
ParseResult(scheme='http', netloc='172.217.9.174:443', path='/c2dm/register3', params='', query='', fragment='')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM