简体   繁体   English

如何解析然后反解析 url 查询字符串,使其以与以前相同的格式/编码结束?

[英]How to parse and then unparse a url query string so that it ends up in the same format/encoding as before?

Is there a way that I can take a url, parse it to get the query, edit the query with python, then remake the url so that its exactly the same (same format, encoding, etc).有没有一种方法可以获取 url,对其进行解析以获取查询,使用 python 编辑查询,然后重新制作 url,使其完全相同(相同的格式、编码等)。 Here is what I have tried using urllib functions这是我使用 urllib 函数尝试过的

>>> working_url
'https://<some-netloc>/reports/sales-order-history?page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z'
>>> working_parse = urlparse(working_url)
>>> working_parse
ParseResult(scheme='https', netloc='<some-netloc>', path='/reports/sales-order-history', params='', query='page=&sort_direction=&sort_column=&filter%5Bsearch%5D=&filter%5Bofficial%5D%5B0%5D%5Bname%5D=status&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=Pending%2CProcessing%2CReady%20to%20ship%2CDelivering%2CDelivered%2CCompleted&filter%5Bofficial%5D%5B1%5D%5Bname%5D=orderDate&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=2020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z', fragment='')
>>> working_query_dict = parse_qs(working_parse.query)

Here is where I would edit working_query_dict to change those timestamps for instance.例如,我将在此处编辑working_query_dict以更改这些时间戳。 Now I use urlencode to encode the dictionary again and urlunparse to turn it back into a real working url.现在我使用 urlencode 再次对字典进行编码,并使用 urlunparse 将其转回真正的工作 url。

>>> working_query_dict
{'filter[official][0][name]': ['status'], 'filter[official][0][value]': ['Pending,Processing,Ready to ship,Delivering,Delivered,Completed'], 'filter[official][1][name]': ['orderDate'], 'filter[official][1][value]': ['2020-05-10T07:00:00.000Z,2020-05-18T06:59:59.999Z']}
>>> urlunparse((working_parse.scheme,working_parse.netloc,working_parse.path,working_parse.params,urlencode(working_query_dict),working_parse.fragment))
'https://<some-net-loc>/reports/sales-order-history?filter%5Bofficial%5D%5B0%5D%5Bname%5D=%5B%27status%27%5D&filter%5Bofficial%5D%5B0%5D%5Bvalue%5D=%5B%27Pending%2CProcessing%2CReady+to+ship%2CDelivering%2CDelivered%2CCompleted%27%5D&filter%5Bofficial%5D%5B1%5D%5Bname%5D=%5B%27orderDate%27%5D&filter%5Bofficial%5D%5B1%5D%5Bvalue%5D=%5B%272020-05-10T07%3A00%3A00.000Z%2C2020-05-18T06%3A59%3A59.999Z%27%5D' 

However, this url that gets formed doesn't work - it doesn't resolve to the same place on the website.但是,这个形成的 url 不起作用 - 它不会解析到网站上的同一个位置。 Even looking at it, you can tell its changed, even though I changed no attributes or anything.即使看着它,你也可以看出它发生了变化,即使我没有改变任何属性或任何东西。

Im thinking maybe I need to like, detect the encoding or format when doing parse_qs, and then use that format when doing urlencode?我在想也许我需要喜欢,在执行 parse_qs 时检测编码或格式,然后在执行 urlencode 时使用该格式? How can I do this?我怎样才能做到这一点?

Ok the key is the urlencode flag quote_via=urllib.parse.quote .好的,关键是 urlencode 标志quote_via=urllib.parse.quote Additionally, parse_qs could be changed to parse_qsl in order to preserve ordering of parameters, and the keep_blank_labels=True to that function maintains even the blank parameters in the dictionary if you want an absolutely true match.此外,可以将 parse_qs 更改为 parse_qsl 以保留参数的顺序,并且 keep_blank_labels=True 到 function 如果您想要绝对真实的匹配,甚至可以维护字典中的空白参数。

So now this works for me:所以现在这对我有用:

>>> from urllib.parse import quote, parse_qsl,urlencode
>>> urlencode(parse_qsl(working_parse.query,keep_blank_values=True),quote_via=quote) == working_parse.query
True

it takes a complicated query (which you could edit the attributes if you want), parses it out and urlencodes it to the original query string.它需要一个复杂的查询(您可以根据需要编辑属性),将其解析出来并将其 urlencode 为原始查询字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM