简体   繁体   中英

Python downloading PDF into a .zip

What I am trying to do is loop through a list of URL to download a series of .pdfs, and save them to a .zip. At the moment I am just trying to test code using just one URL. The ERROR I am getting is:

Traceback (most recent call last):
  File "I:\test_pdf_download_zip.py", line 36, in <module>
    zip_file(zipfile_name, url)
  File "I:\test_pdf_download_zip.py", line 30, in zip_file
    myzip.write(dowload_pdf(url))
TypeError: expected a string or other character buffer object

Would someone know how to pass .pdf request to the .zip correctly (avoiding the error above) in order for me to append it, or know if it is possible to do this?

import os
import zipfile
import requests

output = r"I:"

# File name of the zipfile
zipfile_name = os.path.join(output, "test.zip")

# Random test pdf
url = r"http://www.pdf995.com/samples/pdf.pdf"

def create_zipfile(zipfile_name):
    zipfile.ZipFile(zipfile_name, "w")

def dowload_pdf(url):
    response = requests.get(url, stream=True)
    with open('test.pdf', 'wb') as f:
        f.write(response.content)

def zip_file(zip_name, url):
    with open(zip_name,'a') as myzip:
        myzip.write(dowload_pdf(url))

if __name__ == "__main__":
    create_zipfile(zipfile_name)
    zip_file(zipfile_name, url)
    print("Done")

Your download_pdf() function is saving a file but it doesn't return anything. You need to modify it so it actually returns the file path to myzip.write() . You don't want to hardcode test.pdf but pass unique paths to your download function so you don't end up with multiple test.pdf in your archive.

def dowload_pdf(url, path):
    response = requests.get(url, stream=True)
    with open(path, 'wb') as f:
        f.write(response.content)
    return path

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM