简体   繁体   中英

Python's streaming `TarFile` and `os.pipe()`: Archive is incomplete

I'm trying to create a tar archive in Python and during creation send/stream it's bytes to a remote host. The communication with the remote host is a custom protocol, with each message/packet carrying payload of a specific size.

To try parallel creation and reading of a tar archive, I wrote the following simple test script:

import tarfile
import threading
import os
import select

BLOCKSIZE = 4096

(r,w) = os.pipe()
wfd = os.fdopen(w, "w")

def maketar(buf, paths):
    tar = tarfile.open(mode='w|', fileobj=buf)
    for p in paths:
        tar.add(p)
    tar.close()

x = threading.Thread(target=maketar, args=(wfd, ["1M", "2M"]))
x.start()

poller = select.poll()
poller.register(r, select.POLLIN)

with open("out/archive.tar", "wb") as outf:
    while True:
        if poller.poll(10):
            outf.write(os.read(r, BLOCKSIZE))
        elif not x.is_alive():
            break

The files 1M and 2M are supposed to be packed into out/archive.tar . However, the archive is corrupt after the script finishes:

$ tar xf archive.tar
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
$ ls -la
total 4.0M
-rw-rw-r-- 1 xx xx  1.0M Nov  8 11:37 1M
-rw-rw-r-- 1 xx xx 1023K Nov  8 13:12 2M
-rw-rw-r-- 1 xx xx  2.0M Nov  8 12:55 archive.tar

Both files should be of size 1M ; the size of the archive is approximately correct, but 2M is too small. What am I missing here? Is it a buffering issue of the os.pipe() file descriptors?

Turns out I simply needed to buf.flush() the write buffer at the end of the maketar() function. It works fine now.

def maketar(buf, paths):
    tar = tarfile.open(mode='w|', fileobj=buf)
    for p in paths:
        tar.add(p)
    tar.close()
    buf.flush()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM