简体   繁体   English

Python的流式`TarFile`和`os.pipe()`:存档不完整

[英]Python's streaming `TarFile` and `os.pipe()`: Archive is incomplete

I'm trying to create a tar archive in Python and during creation send/stream it's bytes to a remote host.我正在尝试在 Python 中创建一个tar存档,并在创建过程中将其字节发送/流式传输到远程主机。 The communication with the remote host is a custom protocol, with each message/packet carrying payload of a specific size.与远程主机的通信是一种自定义协议,每个消息/数据包都携带特定大小的有效负载。

To try parallel creation and reading of a tar archive, I wrote the following simple test script:为了尝试并行创建和读取tar存档,我编写了以下简单的测试脚本:

import tarfile
import threading
import os
import select

BLOCKSIZE = 4096

(r,w) = os.pipe()
wfd = os.fdopen(w, "w")

def maketar(buf, paths):
    tar = tarfile.open(mode='w|', fileobj=buf)
    for p in paths:
        tar.add(p)
    tar.close()

x = threading.Thread(target=maketar, args=(wfd, ["1M", "2M"]))
x.start()

poller = select.poll()
poller.register(r, select.POLLIN)

with open("out/archive.tar", "wb") as outf:
    while True:
        if poller.poll(10):
            outf.write(os.read(r, BLOCKSIZE))
        elif not x.is_alive():
            break

The files 1M and 2M are supposed to be packed into out/archive.tar .文件1M2M应该被打包到out/archive.tar中。 However, the archive is corrupt after the script finishes:但是,脚本完成后存档已损坏:

$ tar xf archive.tar
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
$ ls -la
total 4.0M
-rw-rw-r-- 1 xx xx  1.0M Nov  8 11:37 1M
-rw-rw-r-- 1 xx xx 1023K Nov  8 13:12 2M
-rw-rw-r-- 1 xx xx  2.0M Nov  8 12:55 archive.tar

Both files should be of size 1M ;两个文件的大小都应为1M the size of the archive is approximately correct, but 2M is too small.存档的大小大致正确,但2M太小了。 What am I missing here?我在这里想念什么? Is it a buffering issue of the os.pipe() file descriptors?这是os.pipe()文件描述符的缓冲问题吗?

Turns out I simply needed to buf.flush() the write buffer at the end of the maketar() function.结果我只需要在buf.flush() function 的末尾对写入缓冲区进行maketar() It works fine now.它现在工作正常。

def maketar(buf, paths):
    tar = tarfile.open(mode='w|', fileobj=buf)
    for p in paths:
        tar.add(p)
    tar.close()
    buf.flush()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM