简体   繁体   中英

python requests handling certain http responses

I am trying to get an http response from a website using the requests module. I get status code 410 in my response:

<Response [410]>

From the documentation, it appears that the forwarding url for the web content may not be intentionally available to the clients. Is this indeed the case, or am I missing something? Trying to confirm if the webpage can be scrapped at all:

url='http://www.b2i.us/profiles/investor/ResLibraryView.asp?ResLibraryID=81517&GoTopage=3&Category=1836&BzID=1690&G=666'

try:
    response = requests.get(url)
 except requests.exceptions.RequestException as e:
    print(e)

Some webisites don't respond well to HTTP requests with 'python-requests' as a User Agent String.
You can get a 200 OK response if you set the User-Agent header to 'Mozilla'.

url='http://www.b2i.us/profiles/investor/ResLibraryView.asp?ResLibraryID=81517&GoTopage=3&Category=1836&BzID=1690&G=666'
headers={'User-Agent':'Mozilla/5'}
response = requests.get(url, headers=headers)
print(response)

< Response [200] >

This works for Mac OSX, but I am having issues with the same approach in Windows on a VMWare virtual machine I run automated tasks from. Why would the behavior be different? Is there a separate workaround for Window machines?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM