简体   繁体   中英

python requests link headers

I'm trying to find best way to capture links listed under response headers, exactly like this one and I'm using python requests module. Below is link which has Link Headers section on Python Requests page: docs.python-requests.org/en/latest/user/advanced/

But, in my case my response headers contains links like below:

{'content-length': '12276', 'via': '1.1 varnish-v4', 'links': '<http://justblahblahblah.com/link8.html>;rel="last">,<http://justblahblahblah.com/link2.html>;rel="next">', 'vary': 'Accept-Encoding, Origin'}

Please notice > after "last" which is not the case under Requests examples and I just cant seem to figure out how to solve this.

There is already a way provided by requests to access links header

response.links

It returns the dictionary of links header value which can easily parsed further using

response.links['next']['url']

to get the required values.

You can parse the header's value manually. To make things easier you might want to use request's parsing function parse_header_links as a reference.

Or you can do some find/replace and use original parse_header_links

In [1]: import requests

In [2]: d = {'content-length': '12276', 'via': '1.1 varnish-v4', 'links': '<http://justblahblahblah.com/link8.html>;rel="last">,<http://justblahblahblah.com/link2.html>;rel="next">', 'vary': 'Accept-Encoding, Origin'}

In [3]: requests.utils.parse_header_links(d['links'].rstrip('>').replace('>,<', ',<'))
Out[3]:
[{'rel': 'last', 'url': 'http://justblahblahblah.com/link8.html'},
 {'rel': 'next', 'url': 'http://justblahblahblah.com/link2.html'}]

If there might be a space or two between >, and < then you need to do replace with a regular expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM