简体   繁体   中英

escaping query string with special characters with python

I got some pretty messy urls that i got via scraping here, problem is that they contain spaces or other special characters in the path and query string, here is some example

http://www.example.com/some path/to the/file.html
http://www.example.com/some path/?file=path to/file name.png&name=name.me

so, is there an easy and robust way to escape the urls so that i can pass them to urlopen? i tried urlib.quote, but it seems to escape the '?', '&', and '=' in the query string as well, and it seems to escape the protocol as well, currently, what i am trying to do is use regex to separate the protocol, path name, and query string and escape them separately, but there are cases where they arent separated properly any advice is appreciated

urllib.quote will quote everything except / by default. You can pass it a list of characters to leave alone as the second argument:

urllib.quote('http://www.example.com/some path/?file=path to/file name.png&name=name.me',
             '/:?&=')
'http://www.example.com/some%20path/?file=path%20to/file%20name.png&name=name.me'

But this is pretty tricky stuff to be messing with semimanually.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM