从php服务器下载pdf个文件|| 保存不可用的文件

Question

我正在尝试下载位于 PHP 服务器上的 PDF（少数可以是 word 文件，很少见）。 似乎在服务器上，PDF 的编号从 1 增加到 14000。可以使用以下链接下载 PDF： http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php? id=X ，其中 X 是 [1, 14000] 范围内的数字。 我对 X = 200 使用以下代码，然后我可以遍历所有 [1, 14000] 值以将所有文件保存在特定文件夹中。 如果 pdf 不存在，代码当前创建一个字节大小为零的 pdf 文件，对应于 X 值。 我正在使用以下代码对不存在 pdf 的 20 个 X 值进行测试。

import requests

urls = [('13980', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13980'),
        ('13981', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13981'),
        ('13982', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13982'),  
        ('13983', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13983'), 
        ('13984', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13984'), 
        ('13985', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13985'), 
        ('13986', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13986'), 
        ('13987', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13987'),
        ('13988', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13988'),
        ('13989', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13989'), 
        ('13990', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13990'), 
        ('13991', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13991'), 
        ('13992', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13992'), 
        ('13993', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13993'), 
        ('13994', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13994'), 
        ('13995', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13995'), 
        ('13996', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13996'), 
        ('13997', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13997'), 
        ('13998', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13998'), 
        ('13999', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=13999'), 
        ('14000', 'http://ppmoe.dot.ca.gov/des/oe/awards/bidsum/dl.php?id=14000')]

for number, url in urls:
    s = requests.Session()
    response = s.get(url)
    
    with open("/Users/aartimalik/Downloads/test/" + number + "_phptest.pdf", "wb") as f:
        f.write(response.content)
        f.close()

此代码保存 0 字节 pdf，因为与这些数字对应的 pdf 不存在。 我希望它：仅当存在与 x 文件对应的 pdf 文件时才保存 pdf 文件，如果不存在则返回“没有 pdf 文件”……我不确定是否可以with open 。 任何帮助表示赞赏。 谢谢！

Answer 1

以下工作（可以修改它以包括 pdf）：

import requests
import os

os.chdir("/Users/aartimalik/Documents/GitHub/revenue_procurement/pdfs")

from phpurldoc import urls

print(urls)

for number, url in urls:
    s = requests.Session()
    response = s.get(url)
    h = response.headers["Content-Disposition"].split("=")[-1]

    if h[-1] == "x":
        with open("./bidsummaries-doc/" + h + "_" + number + ".docx", "wb") as f:
            f.write(response.content)
            f.close()

    else:
        with open("./bidsummaries-doc/" + h + "_" + number + ".doc", "wb") as f:
            f.write(response.content)
            f.close()

从php服务器下载pdf个文件|| 保存不可用的文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-12-31 00:37:55

从php服务器下载pdf个文件|| 保存不可用的文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-12-31 00:37:55

解决方案1
0 已采纳 2022-12-31 00:37:55