简体   繁体   English

修改 url 参数以从多个网站下载图像

[英]Modifying the url parameter to download images from multiple web-sites

I was trying to download images from all the cases included in CaseIDs array, but it doesn't work.我试图从 CaseIDs 数组中包含的所有案例中下载图像,但它不起作用。 I want code to run for all cases.我希望代码在所有情况下都能运行。

from bs4 import BeautifulSoup
import requests as rq
from urllib.parse import urljoin
from tqdm import tqdm

CaseIDs = [100237, 99817, 100271]

with rq.session() as s:
    for caseid in tqdm(CaseIDs):
        url = 'https://crashviewer.nhtsa.dot.gov/nass-CIREN/CaseForm.aspx?xsl=main.xsl&CaseID= {caseid}'
        r = s.get(url)
        soup = BeautifulSoup(r.text, "html.parser")

        url = urljoin(url, soup.find('a', text='Text and Images Only')['href'])
        r = s.get(url)
        soup = BeautifulSoup(r.text, "html.parser")

        links = [urljoin(url, i['src']) for i in soup.select('img[src^="GetBinary.aspx"]')]

        count = 0
        for link in links:
            content = s.get(link).content
            with open("test_image" + str(count) + ".jpg", 'wb') as f:
                f.write(content)
            count += 1

You need to use an f-string to pass your caseId value in, as you're trying to do:您需要使用 f 字符串来传递caseId值,就像您尝试执行的那样:

url = f'https://crashviewer.nhtsa.dot.gov/nass-CIREN/CaseForm.aspx?xsl=main.xsl&CaseID= {caseid}'

(You probably also need to remove the space between the = and the { ) (您可能还需要删除={之间的空格)

尝试使用format()像这样:

url = 'https://crashviewer.nhtsa.dot.gov/nass-CIREN/CaseForm.aspx?xsl=main.xsl&CaseID={}'.format(caseid)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM