簡體   English   中英

為什么我不能使用此代碼下載 pdf?

[英]Why can't I download the pdf using this code?

import requests
from bs4 import BeautifulSoup
from urllib.request import urlretrieve

url='https://www.go100.com.tw/exam_download_3.php'
response = requests.get(url)
response.encoding = 'utf-8'

soup = BeautifulSoup(response.text, 'html.parser')
#print(soup)
all_as = soup.find_all('a')

for index, a_tag in enumerate(all_as):
    if 'pdf' in a_tag['href']:  
        #print(a_tag['href'])
        urlretrieve(a_tag['href'], 'file_tmp.pdf')
        break

它顯示 ValueError,我找不到問題所在。 這是結果

你已經完成了 90% 的工作。 您必須使用urljoin中的urllib.parse

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

url='https://www.go100.com.tw/exam_download_3.php'
response = requests.get(url)
response.encoding = 'utf-8'

soup = BeautifulSoup(response.text, 'html.parser')
#print(soup)
all_as = soup.find_all('a')

for index, a_tag in enumerate(all_as):
    if 'pdf' in a_tag['href']:  
        print(a_tag['href'])
        print(urljoin(url, a_tag['href']))
        response = requests.get(urljoin(url, a_tag['href']))
        open("file_tmp.pdf", "wb").write(response.content)
        break

您只能使用請求下載所有這些文件:

import requests
from bs4 import BeautifulSoup
import re

url='https://www.go100.com.tw/exam_download_3.php'

s = requests.Session()

correct_links = []
r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
links = [a.get('href') for a in soup.select('a') if '.pdf' in a.get('href')]
for link in links:
    if 'https://' not in link:
        link = 'https://www.go100.com.tw' + link
    correct_links.append(link)
for link in list(set(correct_links)):
    r = s.get(link)
    with open(f"{re.sub(r'[^a-zA-Z0-9]', '', link)}.pdf", "wb") as f:
        f.write(r.content)
        print(f"saved {re.sub(r'[^a-zA-Z0-9]', '', link)}")

這會將所有可下載的 pdf 保存在運行腳本的同一文件夾中,並具有相關名稱。 請求文檔: https://requests.readthedocs.io/en/latest/

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM