简体   繁体   中英

Passing a file-like object to write() method of another file-like object

I am trying to get a large file from the web, and stream it directly into the zipfile writer provided by the zipfile module, something like:

from urllib.request import urlopen
from zipfile import ZipFile

zip_file = ZipFile("/a/certain/local/zip/file.zip","a")
entry = zip_file.open("an.entry","w")
entry.write( urlopen("http://a.certain.file/on?the=web") )

Apparently, this doesn't work because .write accepts a bytes argument, not an I/O reader. However, since the file is rather large I don't want to load the whole file into RAM before compressing it.

The simple solution is to use bash (never really tried, could be wrong):

curl -s "http://a.certain.file/on?the=web" | zip -q /a/certain/local/zip/file.zip

but it wouldn't be a very elegant, nor convenient, thing to put a single line of bash in a Python script.

Another solution is to use urllib.request.urlretrieve to download the file and then pass the path to zipfile.ZipFile.open , but that way I would still have to wait for the download to complete, and besides that also consume a lot more disk I/O resource.

Is there a way, in Python, to directly pass the download stream to a zipfile writer, like the the bash pipeline above?

You can use shutil.copyfileobj() to efficiently copy data between file objects:

from shutil import copyfileobj

with ZipFile("/a/certain/local/zip/file.zip", "w") as zip_file:
    with zip_file.open("an.entry", "w") as entry:
        with urlopen("http://a.certain.file/on?the=web") as response:
            shutil.copyfileobj(response, entry)

This'll call .read() with a given chunksize on the source file object, then pass that chunk to the .write() method on the target file object.

If you are using Python 3.5 or older (where you can't yet directly write to a ZipFile member), your only option is to stream to a temporary file first:

from shutil import copyfileobj
from tempfile import NamedTemporaryFile

with ZipFile("/a/certain/local/zip/file.zip", "w") as zip_file:
    with NamedTemporaryFile() as cache:
        with urlopen("http://a.certain.file/on?the=web") as response:
            shutil.copyfileobj(response, cache)
            cache.flush()
            zipfile.write('an.entry', cache.name)

Using a NamedTemporaryFile() like this only works on POSIX systems, on Windows, you can't open the same filename again, so you'd have to use a tempfile.mkstemp() generated name , open the file from there, and use try...finally to clean up afterwards.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM