How to download pdf files using Python?

Question

I was looking for a way to download pdf files in python, and I saw answers on other questions recommending the urllib module. I tried to download a pdf file using it, but when I try to open the downloaded file, a message shows up saying that the file cannot be opened.

error message

This is the code I used-

import urllib
urllib.urlretrieve("http://papers.gceguide.com/A%20Levels/Mathematics%20(9709)/9709_s11_qp_42.pdf", "9709_s11_qp_42.pdf")

What am I doing wrong? Also, the file automatically saves to the directory my python file is in. How do I change the location to which it gets saved?

Edit- I tried again with the link to a sample pdf, http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf

The code is working with this link, so why won't it work for the other one?

Answer 1

Try this. It works.

import requests
url='https://pdfs.semanticscholar.org/c029/baf196f33050ceea9ecbf90f054fd5654277.pdf'
r = requests.get(url, stream=True)

with open('C:/Users/MICRO HARD/myfile.pdf', 'wb') as f:
f.write(r.content)

Answer 2

You can also use wget to download pdfs via a link:

import wget

wget.download(link)

Here's a guide about how to search & download all pdf files from a webpage in one go: https://medium.com/the-innovation/notesdownloader-use-web-scraping-to-download-all-pdfs-with-python-511ea9f55e48

Answer 3

You can't download the pdf content from the given url using requests or urllib .
Because initially the given url was pointed to another web page after that only it loads the pdf.
If you have doubt save the response as html instead of pdf.
You need to use headless browsers like panthomJS to download files from these kind of web pages.

How to download pdf files using Python?

Question

3 answers

solution1
9 2017-08-14 08:40:54

solution2
4 2020-12-24 09:21:29

solution3
0 2017-05-10 13:52:51

How to download pdf files using Python?

Question

3 answers

solution1 9 2017-08-14 08:40:54

solution2 4 2020-12-24 09:21:29

solution3 0 2017-05-10 13:52:51

solution1
9 2017-08-14 08:40:54

solution2
4 2020-12-24 09:21:29

solution3
0 2017-05-10 13:52:51