简体   繁体   English

Python将PDF下载到.zip中

[英]Python downloading PDF into a .zip

What I am trying to do is loop through a list of URL to download a series of .pdfs, and save them to a .zip. 我想做的是遍历URL列表以下载一系列.pdf,并将其保存到.zip。 At the moment I am just trying to test code using just one URL. 目前,我只是尝试仅使用一个URL测试代码。 The ERROR I am getting is: 我得到的错误是:

Traceback (most recent call last):
  File "I:\test_pdf_download_zip.py", line 36, in <module>
    zip_file(zipfile_name, url)
  File "I:\test_pdf_download_zip.py", line 30, in zip_file
    myzip.write(dowload_pdf(url))
TypeError: expected a string or other character buffer object

Would someone know how to pass .pdf request to the .zip correctly (avoiding the error above) in order for me to append it, or know if it is possible to do this? 有人会知道如何正确地将.pdf请求传递给.zip(避免上面的错误)以便我附加它,或者是否知道可以这样做吗?

import os
import zipfile
import requests

output = r"I:"

# File name of the zipfile
zipfile_name = os.path.join(output, "test.zip")

# Random test pdf
url = r"http://www.pdf995.com/samples/pdf.pdf"

def create_zipfile(zipfile_name):
    zipfile.ZipFile(zipfile_name, "w")

def dowload_pdf(url):
    response = requests.get(url, stream=True)
    with open('test.pdf', 'wb') as f:
        f.write(response.content)

def zip_file(zip_name, url):
    with open(zip_name,'a') as myzip:
        myzip.write(dowload_pdf(url))

if __name__ == "__main__":
    create_zipfile(zipfile_name)
    zip_file(zipfile_name, url)
    print("Done")

Your download_pdf() function is saving a file but it doesn't return anything. 您的download_pdf()函数正在保存文件,但不返回任何内容。 You need to modify it so it actually returns the file path to myzip.write() . 您需要对其进行修改,以便它实际上将文件路径返回到myzip.write() You don't want to hardcode test.pdf but pass unique paths to your download function so you don't end up with multiple test.pdf in your archive. 您不想对test.pdf进行硬编码,而是将唯一的路径传递给您的下载功能,因此最终test.pdf在归档文件中使用多个test.pdf

def dowload_pdf(url, path):
    response = requests.get(url, stream=True)
    with open(path, 'wb') as f:
        f.write(response.content)
    return path

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM