简体   繁体   中英

requests.get() returns 200 but body is empty

I have a slightly large python3 import script where part of it is fetching some URL and parsing the body.

The code looks like this:

 import requests
 url = 'http://...'  <-- some url here which returns an html page with curl
 req = requests.get(url)

 print("--- status_code %s" % req.status_code)
 print("--- body length %s" % len(req.text))

I am getting:

 --- status_code 200
 --- body length 0

Looking at the headers I see this:

 {'Keep-Alive': 'timeout=5, max=100', 'Content-Length': '0', 'Date': 'Mon, 06 Nov 2017 03:14:49 GMT', 'Server': 'Apache/2.4.18 (Ubuntu)', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html; charset=utf-8'}

I've tried searching everywhere about why the content length is 0 and I am not able to figure it out.

To test this as a unit I created a small script just to test the same URL with the same snippet . This test script is working fine!

Why is one script working but not the other? I am reading that this is blocking by default so it should be working in both cases. Is there anything I am missing?

How many times and how often do you try to access the server from the main script and from the snippet?

If you try to parse some external site, it may become "angry" and return you zero-sized content. It is quite a common measure to prevent site grabbing. In this scenario, your test script would work just fine as long as it is executed only once or twice. Your main script, however, after a certain number of executions (five, or ten, or ten per second) will be restricted by the site for some amount of time.

If it's the case, you can try to insert some delay in your script.

I figured this out. The issue was my stupidity. The URL I was trying to fetch contained a "\\n" character in query which was causing the page to throw an error. Thanks Klaus for reminding me to check the server.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM