简体   繁体   English

不能腌制 defaultdict

[英]Can't pickle defaultdict

I have a defaultdict that looks like this:我有一个看起来像这样的 defaultdict:

dict1 = defaultdict(lambda: defaultdict(int))

The problem is, I can't pickle it using cPickle.问题是,我不能用 cPickle 腌制它。 One of the solution that I found here is to use module-level function instead of a lambda.我在这里找到的解决方案之一是使用模块级函数而不是 lambda。 My question is, what is module-level function?我的问题是,什么是模块级功能? How can I use the dictionary with cPickle?如何在 cPickle 中使用字典?

In addition to Martijn's explanation :除了Martijn 的解释

A module-level function is a function which is defined at module level, that means it is not an instance method of a class, it's not nested within another function, and it is a "real" function with a name, not a lambda function.模块级函数是在模块级定义的函数,这意味着它不是类的实例方法,它不嵌套在另一个函数中,它是具有名称的“真实”函数,而不是 lambda 函数.

So, to pickle your defaultdict , create it with module-level function instead of a lambda function:因此,要腌制您的defaultdict ,请使用模块级函数而不是 lambda 函数创建它:

def dd():
    return defaultdict(int)

dict1 = defaultdict(dd) # dd is a module-level function

than you can pickle it比你可以腌制它

tmp = pickle.dumps(dict1) # no exception
new = pickle.loads(tmp)

Pickle wants to store all the instance attributes, and defaultdict instances store a reference to the default callable. Pickle 想要存储所有实例属性,而defaultdict实例存储对default可调用对象的引用。 Pickle recurses over each instance attribute. Pickle 对每个实例属性进行递归。

Pickle cannot handle lambdas; Pickle 无法处理 lambdas; pickle only ever handles data, not code, and lambdas contain code. pickle 只处理数据,不处理代码,并且 lambda 包含代码。 Functions can be pickled, but just like class definitions only if the function can be imported .函数可以被腌制,但就像类定义一样,只有当函数可以被导入时 A function defined at the module level can be imported.可以导入在模块级别定义的函数。 Pickle just stores a string in that case, the full 'path' of the function to be imported and referenced when unpickling again.在这种情况下,Pickle 只存储一个字符串,即再次解压时要导入和引用的函数的完整“路径”。

You can however use partial to accomplish this:但是,您可以使用partial来完成此操作:

>>> from collections import defaultdict
>>> from functools import partial
>>> pickle.loads(pickle.dumps(defaultdict(partial(defaultdict, int))))
defaultdict(<functools.partial object at 0x94dd16c>, {})

To do this, just write the code you wanted to write.为此,只需编写您想要编写的代码。 I'd use dill , which can serialize lambdas and defaultdicts.我会使用dill ,它可以序列化 lambdas 和 defaultdicts。 Dill can serialize almost anything in python. Dill 几乎可以序列化 Python 中的任何东西。

>>> import dill
>>> from collections import defaultdict
>>>
>>> dict1 = defaultdict(lambda: defaultdict(int))
>>> pdict1 = dill.dumps(dict1)
>>> _dict1 = dill.loads(pdict1)
>>> _dict1
defaultdict(<function <lambda> at 0x10b31b398>, {})
dict1 = defaultdict(lambda: defaultdict(int))
cPickle.dump(dict(dict1), file_handle)

worked for me为我工作

If you don't care about preserving the defaultdict type, convert it:如果您不关心保留 defaultdict 类型,请将其转换:

fname = "file.pkl"

for value in nested_default_dict:
    nested_default_dict[value] = dict(nested_default_dict[value])
my_dict = dict(nested_default_dict)

with open(fname, "wb") as f:
    pickle.dump(my_dict, f)  # Now this will work

I think this is a great alternative since when you are pickling, the object is probably in it's final form... AND, if really do need the defaultdict type again, you can simply convert is back after you unpickle:我认为这是一个很好的选择,因为当你进行酸洗时,对象可能处于它的最终形式......而且,如果真的需要再次使用 defaultdict 类型,你可以在解压后简单地转换回来:

for value in my_dict:
    my_dict[value] = defaultdict(type, my_dict[value])
nested_default_dict = defaultdict(type, my_dict)

Implementing the anonymous lambda function by a normal function worked for me.通过普通函数实现匿名 lambda 函数对我有用。 As pointed out by Mike, Pickle cannot handle lambdas;正如 Mike 所指出的,Pickle 无法处理 lambdas; pickle only handles data. pickle 只处理数据。 Hence, converting the defaultdict method from:因此,将 defaultdict 方法从:

    dict_ = defaultdict(lambda: default_value)

to:至:

    def default_():
        return default_value

and then creating the default dict as follows worked for me:然后按如下方式创建默认字典对我有用:

    dict_ = defaultdict(default_)

I'm currently doing something similar to the question poser, however, I'm using a subclass of defaultdict which has a member function that is used as the default_factory.我目前正在做类似于问题提出者的事情,但是,我使用的是 defaultdict 的子类,它具有用作 default_factory 的成员函数。 In order to have my code work properly (I required the function to be defined at runtime), I simply added some code to prepare the object for pickling.为了让我的代码正常工作(我需要在运行时定义函数),我只是添加了一些代码来准备对象进行酸洗。

Instead of:代替:

...
pickle.dump(dict, file)
...

I use this:我用这个:

....
factory = dict.default_factory
dict.default_factory = None
pickle.dump(dict, file)
dict.default_factory = factory
...

This isn't the exact code I used as my tree is an object which creates instances of the same the tree's type as indexes are requested (so I use a recursive member function to do the pre/post pickle operations), but this pattern also answers the question.这不是我使用的确切代码,因为我的树是一个对象,它创建与请求索引相同的树类型的实例(所以我使用递归成员函数来执行前/后pickle操作),但这种模式也回答问题。

Solution that still works as a one-liner for this case, and is actually more efficient than the lambda (or an equivalent def -ed) function to boot:在这种情况下仍然可以作为单行解决方案的解决方案,实际上比启动lambda (或等效的def -ed)函数更有效:

dict1 = defaultdict(defaultdict(int).copy)

That just makes a template defaultdict(int) , and binds its copy method as the default factory for the outer defaultdict .这只是创建一个模板defaultdict(int) ,并将其copy方法绑定为外部defaultdict的默认工厂。 Everything in there is picklable, and on CPython (where defaultdict is a built-in type implemented in C) it's more efficient than invoking any user-defined function to do the same job.里面的所有东西都是可以选择的,而且在 CPython 上( defaultdict是用 C 实现的内置类型),它比调用任何用户定义的函数来完成同样的工作更有效。 No need for extra imports, wrapping, etc.不需要额外的进口、包装等。

Here is a function for an arbitrary base defaultdict for an arbitrary depth of nesting.这是用于任意嵌套深度的任意基本 defaultdict 的函数。

def wrap_defaultdict(instance, times):
    """Wrap an instance an arbitrary number of `times` to create nested defaultdict.
    
    Parameters
    ----------
    instance - e.g., list, dict, int, collections.Counter
    times - the number of nested keys above `instance`; if `times=3` dd[one][two][three] = instance
    
    Notes
    -----
    using `x.copy` allows pickling (loading to ipyparallel cluster or pkldump)
        - thanks https://stackoverflow.com/questions/16439301/cant-pickle-defaultdict
    """
    from collections import defaultdict

    def _dd(x):
        return defaultdict(x.copy)

    dd = defaultdict(instance)
    for i in range(times-1):
        dd = _dd(dd)

    return dd

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM