I think I figured out the issue. I'm using the requests_cache
module and I likely am continuing to request the URL that returned empty many tests ago from the cache. In other words, the cache is holding onto a "bug" and I keep silently using that cache request.
I am trying to run the following code and it's tripping up because the requests.get(url)
portion is coming back empty. I'm looping through a bunch of URLs and the others work fine. There is something about this one that continues behave this way.
urls = ['http://www.basketball-reference.com/players/b/brogdma01.html',
'http://www.basketball-reference.com/players/b/brokaga01.html',
'http://www.basketball-reference.com/players/b/brookpr01.html',
'http://www.basketball-reference.com/players/b/brookcl01.html',
'http://www.basketball-reference.com/players/b/brookaa01.html',
'https://www.basketball-reference.com/players/c/cookqu01.html']
for url in urls:
page_source = requests.get(url).text
The problem URL is this one: https://www.basketball-reference.com/players/c/cookqu01.html
Anyone know why this one would behave differently?
The only thing i saw is that your not working url is the only one with "https" instead of "http".
But i think its the same site so this could be your problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.