简体   繁体   English

将数据直接写入tar存档

[英]Write data directly to a tar archive

I am looking for a way in which I can pickle some Python objects into a combined tar archive. 我正在寻找一种方法,可以将某些Python对象腌制到组合的tar存档中。 Further I also need to use np.save(....) to save some numpy arrays in yet the same archive. 此外,我还需要使用np.save(....)将一些numpy数组保存在同一存档中。 Of corse, I also need to read them later. 当然,我也需要稍后阅读。

So what I tried is 所以我尝试的是

a = np.linspace(1,10,10000)    
tar = tarfile.open(fileName, "w")
tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))
tar.close()

and I get the error: 我得到错误:

'numpy.ndarray' object has no attribute 'write'

Simlar problems I get if I pickle an object in the tar-file. 如果我在tar文件中腌制一个对象,则会遇到类似的问题。 Any suggestions? 有什么建议么? If it is easier, json-pickle would also work. 如果更简单,json-pickle也将起作用。

EDIT: as mentioned in the comments I confused the arguments of np.save(). 编辑:正如评论中提到的,我混淆了np.save()的参数。 However, this does not solve the issue, as now I get the error: 但是,这不能解决问题,因为现在出现错误:

object of type 'NoneType' has no len()

EDIT 2: If there is no solution to the above problem, do you know of any other way of time efficiently boundle files? 编辑2:如果没有解决上述问题的方法,您是否知道其他任何有效的时间限制文件的方法?

First, I'm not a expert tar user, but I can point out a couple of things: 首先,我不是tar专业人士,但我可以指出两点:

 a = np.linspace(1,10,10000)    

 tar = tarfile.open(fileName, "w")

If you want to add a file to an existing file, use the "a" mode (or study the available modes). 如果要将文件添加到现有文件,请使用“ a”模式(或研究可用模式)。 "w" creates a new blank file: “ w”创建一个新的空白文件:

 tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))

The correct use of np.save has already been mentioned. 已经提到了np.save的正确用法。

A TarInfo object is not the file/data, but rather information about the file. TarInfo对象不是文件/数据,而是有关文件的信息。 That information is placed in the tar file before the data, in a 512 byte buffer. 该信息在512字节缓冲区的数据之前放在tar文件中。 tobuf creates such a buffer from the attributes of the object. tobuf从对象的属性创建这样的缓冲区。 frombuf decodes such a buffer. frombuf解码此类缓冲区。 It is used, for example in the fromtarfile method: 例如,它在fromtarfile方法中使用:

def fromtarfile(cls, tarfile):
    """Return the next TarInfo object from TarFile object
       tarfile.
    """
    buf = tarfile.fileobj.read(BLOCKSIZE)
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
    obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
    return obj._proc_member(tarfile)

So clearly frombuf is not what you want to use here. 显然, frombuf不是您要在这里使用的。

A 2009 SO question - python write string directly to tarfile - shows that it is possible to write directly to a tarfile by using a string buffer. 2009年的一个SO问题-python直接将字符串写入tarfile-表明可以使用字符串缓冲区直接写入tarfile。 From the accepted answer: 从接受的答案:

# create a `StringIO` object, and fill it
string = StringIO.StringIO()
...
# create `TarInfo` object:
info = tarfile.TarInfo(name="foo")
info.size=len(string.buf)
# use both with `addfile`:
tar.addfile(tarinfo=info, fileobj=string)

I think you can do a np.save to StringIO buffer, but I'd have to check/test to be sure. 我认为您可以将np.saveStringIO缓冲区,但是我必须进行检查/测试才能确定。 For ordinary arrays, save writes a header with size, shape, dtype info, and then adds the array's data buffer. 对于普通数组, save写一个具有大小,形状,dtype信息的标头,然后添加数组的数据缓冲区。 For other objects and array it resorts to pickle . 对于其他对象和数组,则选择pickle

I'd suggest getting a regular np.save to file, followed by addfile working. 我建议将常规的np.save到文件中,然后再执行addfile Then see if writing to a string buffer works and whether it saves any time. 然后查看写入字符串缓冲区是否可行以及是否节省了任何时间。


Here's a test script. 这是一个测试脚本。 It writes one array to a tar file, closes and reopens the file and writes another, and finally it extracts the files and loads them. 它将一个数组写入tar文件,关闭并重新打开该文件,然后写入另一个数组,最后提取文件并加载它们。 Returned shapes look fine. 返回的形状看起来不错。 I haven't looked at whether it is possible to extract these files to memory buffers or not. 我还没有研究是否可以将这些文件提取到内存缓冲区。

np.savez could do the same thing zip archiving (rather than tar). np.savez可以执行zip存档(而不是tar)相同的操作。

import numpy as np
import tarfile

import io   # python3 version
abuf = io.BytesIO()

np.save(abuf, np.arange(100))
abuf.seek(0)

tar=tarfile.TarFile('test.tar','w')
info= tarfile.TarInfo(name='anArray')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()

abuf = io.BytesIO()
np.save(abuf, np.ones((2,3,4)))
abuf.seek(0)

tar=tarfile.TarFile('test.tar','a')
info= tarfile.TarInfo(name='anOther')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()

tar=tarfile.TarFile('test.tar','r')
print(tar.getnames())
tar.extractall()
# can I extract to buffers?
tar.close()
a=np.load('anArray')
b=np.load('anOther')
print(a.shape, b.shape)

also

1415:~/mypy$ tar -tvf test.tar 
-rw-r--r-- 0/0             480 1969-12-31 16:00 anArray 
-rw-r--r-- 0/0             272 1969-12-31 16:00 anOther

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM