简体   繁体   中英

Python3 - parse_qs doesn't separate arguments as expected

I'm using the urllib -library in Python3. The code:

from urllib.parse import parse_qs
parse_qs('https://www.example.com/?api-url=%2Fp%2Ftest-test-test-000761di%3Fajax%3Dtrue&api-params=%3Ft%3Dst-fs%26tc%3Dtrue')

returns the dictionary:

{
  'https://www.example.com/?api-url': ['/p/test-test-test-000761di?ajax=true'], 
  'api-params': ['?t=st-fs&tc=true']
}

Can someone explain to me how the dictionary is constructed?

Why is ...?api-url and &api-params a key, but ?ajax , ?t , &tc isn't? Where can I read on the topic?

parse_qs() expects just the query string . You passed in a full URL.

If you pass in only the query string , you get:

>>> parse_qs('api-url=%2Fp%2Ftest-test-test-000761di%3Fajax%3Dtrue&api-params=%3Ft%3Dst-fs%26tc%3Dtrue')
{'api-url': ['/p/test-test-test-000761di?ajax=true'], 'api-params': ['?t=st-fs&tc=true']}

This is the correct result for the given query string; the ? , = and & characters you see in the output are escaped in the input query string.

For example, the escaped value for api-params is %3Ft%3Dst-fs%26tc%3Dtrue ; the correct interpretation is the unquoted value for that string, which is '?t=st-fs&tc=true' .

You could then parse those values again , to remove the second layer of query-string syntax, but you must parse out the query strings:

>>> parsed['api-url'][0].partition('?')[-1]
'ajax=true'
>>> parse_qs(parsed['api-url'][0].partition('?')[-1])
{'ajax': ['true']}
>>> parsed['api-params'][0].partition('?')[-1]
't=st-fs&tc=true'
>>> parse_qs(parsed['api-params'][0].partition('?')[-1])
{'t': ['st-fs'], 'tc': ['true']}

I used str.partition() to split the strings on the first ? character, and to get everything after that first character to be parsed as the query string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM