Unable to download file from URL using python

Question

I am trying to download the file from the URL:

https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf

I tried using the python requests library, but the request just timed out. I tried specifying the 'User-Agent' from my browser as a header, but it still just timed out, including when I copied across every single header from my browser into my python script. I tried setting allow_redirects=True, this did not help. I've also tried wget and curl, everything fails apart from actually opening the browser, visiting the URL and downloading the file.

I'm wondering what the actual difference is between the requests in my browser and the python requests where I set the headers to match those in my browser - is there any way I can download this file using python?

Code snippet:

import requests
requests.get("https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf") # hangs

Answer 1

It is difficult to understand what might be going wrong without some code snippet. How is the file being downloaded? Are you getting raw response content and saving that as pdf? The official docs( https://docs.python-requests.org/en/latest/user/quickstart/#raw-response-content ) suggest using chunk based approach to save the streamed/raw content. Did you try that approach?

Answer 2

Check this, It's worked for me.

import requests
headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'}
response = requests.get(
    "https://www.cmegroup.com/content/dam/cmegroup/notices/clearing/2020/08/Chadv20-239.pdf", headers=headers)
pdf = open("Chadv20-239.pdf", 'wb')
pdf.write(response.content)
pdf.close()

Unable to download file from URL using python

Question

2 answers

solution1
1 2021-12-15 09:58:02

solution2
1 ACCPTED 2021-12-15 10:44:29

Unable to download file from URL using python

Question

2 answers

solution1 1 2021-12-15 09:58:02

solution2 1 ACCPTED 2021-12-15 10:44:29

solution1
1 2021-12-15 09:58:02

solution2
1 ACCPTED 2021-12-15 10:44:29