Python - 從 URL 中抓取 PDF 文件

Question

I want to scrape pdf files from this site https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_pilote_7b.pdf I tried this code for that but it doesn't work. 誰能告訴我為什么？

res = requests.get('https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_7b.pdf')
with open('C:\\Users\\sioud\\Desktop\\Manuels scolaires TN\\1\\test.pdf', 'wb') as f:
f.write(ress.content)

Answer 1

res = requests.get('https://www.sigmaths.net/manuels/ph/physique_7b.pdf',stream=True)
with open('test.pdf', 'wb') as f:
    f.write(res.content)

your url is pointing to a reader https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_7b.pdf , remove the 'reader.php?var= for the actual pdf

Answer 2

您也可以使用urlretrieve 。 查看我的解決方案代碼。

from urllib.request import urlretrieve
pdfurl = u"https://www.sigmaths.net/manuels/ph/physique_7b.pdf";
urlretrieve(pdfurl, "test.pdf")

並且你會發現需要的pdf下載名為test.pdf

Python - 從 URL 中抓取 PDF 文件

問題描述

2 個解決方案

解決方案1
1 已采納 2021-01-28 18:35:52

解決方案2
0 2021-01-30 09:07:38

Python - 從 URL 中抓取 PDF 文件

問題描述

2 個解決方案

解決方案1 1 已采納 2021-01-28 18:35:52

解決方案2 0 2021-01-30 09:07:38

解決方案1
1 已采納 2021-01-28 18:35:52

解決方案2
0 2021-01-30 09:07:38