[英]Write data directly to a tar archive
I am looking for a way in which I can pickle some Python objects into a combined tar archive. 我正在寻找一种方法,可以将某些Python对象腌制到组合的tar存档中。 Further I also need to use
np.save(....)
to save some numpy arrays in yet the same archive. 此外,我还需要使用
np.save(....)
将一些numpy数组保存在同一存档中。 Of corse, I also need to read them later. 当然,我也需要稍后阅读。
So what I tried is 所以我尝试的是
a = np.linspace(1,10,10000)
tar = tarfile.open(fileName, "w")
tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))
tar.close()
and I get the error: 我得到错误:
'numpy.ndarray' object has no attribute 'write'
Simlar problems I get if I pickle an object in the tar-file. 如果我在tar文件中腌制一个对象,则会遇到类似的问题。 Any suggestions?
有什么建议么? If it is easier, json-pickle would also work.
如果更简单,json-pickle也将起作用。
EDIT: as mentioned in the comments I confused the arguments of np.save(). 编辑:正如评论中提到的,我混淆了np.save()的参数。 However, this does not solve the issue, as now I get the error:
但是,这不能解决问题,因为现在出现错误:
object of type 'NoneType' has no len()
EDIT 2: If there is no solution to the above problem, do you know of any other way of time efficiently boundle files? 编辑2:如果没有解决上述问题的方法,您是否知道其他任何有效的时间限制文件的方法?
First, I'm not a expert tar
user, but I can point out a couple of things: 首先,我不是
tar
专业人士,但我可以指出两点:
a = np.linspace(1,10,10000)
tar = tarfile.open(fileName, "w")
If you want to add a file to an existing file, use the "a" mode (or study the available modes). 如果要将文件添加到现有文件,请使用“ a”模式(或研究可用模式)。 "w" creates a new blank file:
“ w”创建一个新的空白文件:
tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))
The correct use of np.save
has already been mentioned. 已经提到了
np.save
的正确用法。
A TarInfo
object is not the file/data, but rather information about the file. TarInfo
对象不是文件/数据,而是有关文件的信息。 That information is placed in the tar file before the data, in a 512 byte buffer. 该信息在512字节缓冲区的数据之前放在tar文件中。
tobuf
creates such a buffer from the attributes of the object. tobuf
从对象的属性创建这样的缓冲区。 frombuf
decodes such a buffer. frombuf
解码此类缓冲区。 It is used, for example in the fromtarfile
method: 例如,它在
fromtarfile
方法中使用:
def fromtarfile(cls, tarfile):
"""Return the next TarInfo object from TarFile object
tarfile.
"""
buf = tarfile.fileobj.read(BLOCKSIZE)
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
return obj._proc_member(tarfile)
So clearly frombuf
is not what you want to use here. 显然,
frombuf
不是您要在这里使用的。
A 2009 SO question - python write string directly to tarfile - shows that it is possible to write directly to a tarfile by using a string buffer. 2009年的一个SO问题-python直接将字符串写入tarfile-表明可以使用字符串缓冲区直接写入tarfile。 From the accepted answer:
从接受的答案:
# create a `StringIO` object, and fill it
string = StringIO.StringIO()
...
# create `TarInfo` object:
info = tarfile.TarInfo(name="foo")
info.size=len(string.buf)
# use both with `addfile`:
tar.addfile(tarinfo=info, fileobj=string)
I think you can do a np.save
to StringIO
buffer, but I'd have to check/test to be sure. 我认为您可以将
np.save
到StringIO
缓冲区,但是我必须进行检查/测试才能确定。 For ordinary arrays, save
writes a header with size, shape, dtype info, and then adds the array's data buffer. 对于普通数组,
save
写一个具有大小,形状,dtype信息的标头,然后添加数组的数据缓冲区。 For other objects and array it resorts to pickle
. 对于其他对象和数组,则选择
pickle
。
I'd suggest getting a regular np.save
to file, followed by addfile
working. 我建议将常规的
np.save
到文件中,然后再执行addfile
。 Then see if writing to a string buffer works and whether it saves any time. 然后查看写入字符串缓冲区是否可行以及是否节省了任何时间。
Here's a test script. 这是一个测试脚本。 It writes one array to a tar file, closes and reopens the file and writes another, and finally it extracts the files and loads them.
它将一个数组写入tar文件,关闭并重新打开该文件,然后写入另一个数组,最后提取文件并加载它们。 Returned shapes look fine.
返回的形状看起来不错。 I haven't looked at whether it is possible to extract these files to memory buffers or not.
我还没有研究是否可以将这些文件提取到内存缓冲区。
np.savez
could do the same thing zip archiving (rather than tar). np.savez
可以执行zip存档(而不是tar)相同的操作。
import numpy as np
import tarfile
import io # python3 version
abuf = io.BytesIO()
np.save(abuf, np.arange(100))
abuf.seek(0)
tar=tarfile.TarFile('test.tar','w')
info= tarfile.TarInfo(name='anArray')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()
abuf = io.BytesIO()
np.save(abuf, np.ones((2,3,4)))
abuf.seek(0)
tar=tarfile.TarFile('test.tar','a')
info= tarfile.TarInfo(name='anOther')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()
tar=tarfile.TarFile('test.tar','r')
print(tar.getnames())
tar.extractall()
# can I extract to buffers?
tar.close()
a=np.load('anArray')
b=np.load('anOther')
print(a.shape, b.shape)
also 也
1415:~/mypy$ tar -tvf test.tar
-rw-r--r-- 0/0 480 1969-12-31 16:00 anArray
-rw-r--r-- 0/0 272 1969-12-31 16:00 anOther
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.