简体   繁体   中英

python requests.get(url) times out but works in browser (chrome); how can I tailor the request headers for a certain host?

I am trying to download a file using the python requests module, my code works for some urls/hosts but I've come across one that does not work.

Based on other similar questions it may be related to the User-Agent request header, I have tried to remedy by adding the chrome user-agent but the connection still times out for this particular url (it does work for others).

I have tested opening the url in chrome browser (which works all OK) and inspecting the request headers, but I still can't figure out why my code is failing:

import requests
url = 'http://publicdata.landregistry.gov.uk/market-trend-data/house-price-index-data/Indices-2020-03.csv'
headers = {'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}

session = requests.Session()
session.headers.update(headers)
response = session.get(url, stream=True)
# !!! code fails here for this particular url !!!

with open('test.csv', "wb") as fh:
    for x in response.iter_content(chunk_size=1024):
        if x: fh.write(x)

Update 2020-08-14 I have figured out what was wrong; on the instances where the code was working the urls were using https protocol. This url is http protocol, and my proxy settings were not configured for http only https. After providing a http proxy to requests my code did work as written.

You can disable requests' timeouts by passing timeout=None . Here is the official documentation: https://requests.readthedocs.io/en/master/user/advanced/#timeouts

The code you posted worked for me, it saved the file (129007 lines). It could be that the host is rate-limiting you, try again later to see if it works.

# count lines 
$ wc -l test.csv 
129007 test.csv

# inspect headers
$ head -n 4 test.csv
Date,Region_Name,Area_Code,Index
1968-04-01,Wales,W92000004,2.11932727
1968-04-01,Scotland,S92000003,2.108087275
1968-04-01,Northern Ireland,N92000001,3.300419757

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM