简体   繁体   English

更有效的方法来挑选一个字符串

[英]more efficient way to pickle a string

The pickle module seems to use string escape characters when pickling; pickle模块似乎在酸洗时使用字符串转义字符; this becomes inefficient eg on numpy arrays. 这在numpy数组上变得低效。 Consider the following 考虑以下

z = numpy.zeros(1000, numpy.uint8)
len(z.dumps())
len(cPickle.dumps(z.dumps()))

The lengths are 1133 characters and 4249 characters respectively. 长度分别为1133个字符和4249个字符。

z.dumps() reveals something like "\\x00\\x00" (actual zeros in string), but pickle seems to be using the string's repr() function, yielding "'\\x00\\x00'" (zeros being ascii zeros). z.dumps()显示类似“\\ x00 \\ x00”(字符串中的实际零),但pickle似乎使用字符串的repr()函数,产生“'\\ x00 \\ x00'”(零为ascii零)。

ie ("0" in z.dumps() == False) and ("0" in cPickle.dumps(z.dumps()) == True) ie(z.dumps()中的“0”== False)和(cPickle.dumps中的“0”(z.dumps())== True)

Try using a later version of the pickle protocol with the protocol parameter to pickle.dumps() . 尝试使用更高版本的pickle协议和protocolle参数来pickle.dumps() The default is 0 and is an ASCII text format. 默认值为0,是ASCII文本格式。 Ones greater than 1 (I suggest you use pickle.HIGHEST_PROTOCOL). 一个大于1(我建议你使用pickle.HIGHEST_PROTOCOL)。 Protocol formats 1 and 2 (and 3 but that's for py3k) are binary and should be more space conservative. 协议格式1和2(和3,但py3k)是二进制的,应该更加空间保守。

Solution: 解:

import zlib, cPickle

def zdumps(obj):
  return zlib.compress(cPickle.dumps(obj,cPickle.HIGHEST_PROTOCOL),9)

def zloads(zstr):
  return cPickle.loads(zlib.decompress(zstr))  

>>> len(zdumps(z))
128

z.dumps() is already pickled string ie, it can be unpickled using pickle.loads(): z.dumps()已经是pickle string,也就是说,它可以使用pickle.loads()进行unpickled:

>>> z = numpy.zeros(1000, numpy.uint8)
>>> s = z.dumps()
>>> a = pickle.loads(s)
>>> all(a == z)
True

An improvement to vartec's answer, that seems a bit more memory efficient (since it doesn't force everything into a string): 对vartec的回答有所改进,这似乎有点内存效率(因为它不会强制所有内容都变成字符串):

def pickle(fname, obj):
    import cPickle, gzip
    cPickle.dump(obj=obj, file=gzip.open(fname, "wb", compresslevel=3), protocol=2)

def unpickle(fname):
    import cPickle, gzip
    return cPickle.load(gzip.open(fname, "rb"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM