I have been trying to download a PDF file using requests but, no matter what I do, it keeps returning 403 as status and it is impossible to open the downloaded PDF.
Here is the code I am running:
import requests
url_pdf='https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf'
#session = requests.Session()
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36",
"Accept": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
"Cache-Control": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
"host-header": "6b7412fb82ca5edfd0917e3957f05d89",
"Accept-Encoding": "gzip, deflate, br",
"cache-control": "image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8",
"Connection": "keep-alive",
"referer":"https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf"
}
req=requests.get(url_pdf, headers=headers)
print(req.status_code)
with open("bologna.pdf", 'wb') as f:
f.write(req.content)
f.closed
As you can see, I have tried using a 'Session' object, setting (different) 'User-Agent' as well as other headers but nothing seems to work.
I have also tried using
import os
name='bologna.pdf'
os.system('wget {} -O {}'.format(url_pdf,name))
But it is not working either.
Do you have any idea about what could I do to overcome this problem? I am really struggling to figure it out.
Thank you a lot!
Avoid sending headers unless required, try anonymouse default first (they still get your IP details) and only takes 2 seconds to download:-
curl -o bologna.pdf https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf
Works for my curl enhanced Windows 7 and should work naturally in win10 or 11
>curl -o bologna.pdf https://www.agerborsamerci.it/wp-content/uploads/2022/01/Settimanale-n.-2-del-20-Gennaio-2022-%E2%80%93-Listino-Borsa-n.-2.pdf
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 346k 100 346k 0 0 117k 0 0:00:02 0:00:02 --:--:-- 117k
A 403 error means that you do not have permission to access the page.
Per the link above,
The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it.
I would recommend looking into figuring out what is the relevant permission needed to be on that site/page.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.