I'm using the urllib
-library in Python3. The code:
from urllib.parse import parse_qs
parse_qs('https://www.example.com/?api-url=%2Fp%2Ftest-test-test-000761di%3Fajax%3Dtrue&api-params=%3Ft%3Dst-fs%26tc%3Dtrue')
returns the dictionary:
{
'https://www.example.com/?api-url': ['/p/test-test-test-000761di?ajax=true'],
'api-params': ['?t=st-fs&tc=true']
}
Can someone explain to me how the dictionary is constructed?
Why is ...?api-url
and &api-params
a key, but ?ajax
, ?t
, &tc
isn't? Where can I read on the topic?
parse_qs()
expects just the query string . You passed in a full URL.
If you pass in only the query string , you get:
>>> parse_qs('api-url=%2Fp%2Ftest-test-test-000761di%3Fajax%3Dtrue&api-params=%3Ft%3Dst-fs%26tc%3Dtrue')
{'api-url': ['/p/test-test-test-000761di?ajax=true'], 'api-params': ['?t=st-fs&tc=true']}
This is the correct result for the given query string; the ?
, =
and &
characters you see in the output are escaped in the input query string.
For example, the escaped value for api-params
is %3Ft%3Dst-fs%26tc%3Dtrue
; the correct interpretation is the unquoted value for that string, which is '?t=st-fs&tc=true'
.
You could then parse those values again , to remove the second layer of query-string syntax, but you must parse out the query strings:
>>> parsed['api-url'][0].partition('?')[-1]
'ajax=true'
>>> parse_qs(parsed['api-url'][0].partition('?')[-1])
{'ajax': ['true']}
>>> parsed['api-params'][0].partition('?')[-1]
't=st-fs&tc=true'
>>> parse_qs(parsed['api-params'][0].partition('?')[-1])
{'t': ['st-fs'], 'tc': ['true']}
I used str.partition()
to split the strings on the first ?
character, and to get everything after that first character to be parsed as the query string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.