简体   繁体   English

有没有办法腌制scipy.interpolate.Rbf()对象?

[英]Is there a way to pickle a scipy.interpolate.Rbf() object?

I'm creating a radial basis function interpolation model for a rather large dataset. 我正在为一个相当大的数据集创建一个径向基函数插值模型。 The main call `scipy.interpolate.Rbf(,) takes about one minute and 14 GB of RAM. 主要调用`scipy.interpolate.Rbf(,)需要大约一分钟和14 GB的RAM。 Since not every machine this is supposed to run on is capable of doing this, and since the program will run on the same dataset very often, I'd like to pickle the results to a file. 因为并非每台应该运行的机器都能够执行此操作,并且由于程序将经常在同一数据集上运行,所以我想将结果挑选到文件中。 This is a simplified example: 这是一个简化的例子:

import scipy.interpolate as inter
import numpy as np
import cPickle

x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
y = np.array([1,2,3,4])

rbfi = inter.Rbf(x[:,0], x[:,1], x[:,2], y)

RBFfile = open('picklefile','wb')
RBFpickler = cPickle.Pickler(RBFfile,protocol=2)
RBFpickler.dump(rbfi)
RBFfile.close()

The RBFpickler.dump() call results in a can't pickle <type 'instancemethod'> error. RBFpickler.dump()调用导致RBFpickler.dump() can't pickle <type 'instancemethod'> RBFpickler.dump() can't pickle <type 'instancemethod'>错误。 As I understand, that means there's a method somewhere in there (well, rbfi() is callable), and that can't be pickled for some reason I do not quite understand. 据我所知,这意味着那里有一个方法(好吧, rbfi()是可调用的),并且由于某些原因我不太明白它不能被腌制。

Does anyone know a way of either pickling this in some other way or saving the results of the inter.Rbf() call in some other way? 有没有人知道以某种方式腌制这种方式或以其他方式保存inter.Rbf()调用的结果?

There are some arrays of shape (nd,n) and (n,n) in there ( rbfi.A , rbfi.xi , rbfi.di ...), which I assume store all the interesting information. 那里有一些形状(nd,n)和(n,n)的数组( rbfi.Arbfi.xirbfi.di ...),我假设存储了所有有趣的信息。 I guess I could pickle just those arrays, but then I'm not sure how I could put the object together again... 我想我可以腌制那些阵列,但后来我不知道怎样才能再将物体放在一起......

Edit: Additional constraint: I'm not allowed to install additional libraries on the system. 编辑:附加约束:我不允许在系统上安装其他库。 The only way I can include them is if they are pure Python and I can just include them with the script without having to compile anything. 我可以包含它们的唯一方法是它们是纯Python,我只需将它们包含在脚本中而无需编译任何东西。

I'd use dill to serialize the results… or if you want to have a cached function you could use klepto to cache the function call so you'd minimize reevaluation of the function. 我使用dill来序列化结果......或者如果你想要一个缓存函数,你可以使用klepto来缓存函数调用,这样你就可以最小化函数的重新评估。

Python 2.7.6 (default, Nov 12 2013, 13:26:39) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy.interpolate as inter
>>> import numpy as np
>>> import dill
>>> import klepto
>>> 
>>> x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
>>> y = np.array([1,2,3,4])
>>> 
>>> # build an on-disk archive for numpy arrays,
>>> # with a dictionary-style interface  
>>> p = klepto.archives.dir_archive(serialized=True, fast=True)
>>> # add a caching algorithm, so when threshold is hit,
>>> # memory is dumped to disk
>>> c = klepto.safe.lru_cache(cache=p)
>>> # decorate the target function with the cache
>>> c(inter.Rbf)
<function Rbf at 0x104248668>
>>> rbf = _
>>> 
>>> # 'rbf' is now cached, so all repeat calls are looked up
>>> # from disk or memory
>>> d = rbf(x[:,0], x[:,1], x[:,2], y)
>>> d
<scipy.interpolate.rbf.Rbf object at 0x1042454d0>
>>> d.A
array([[ 1.        ,  1.22905719,  2.36542472,  1.70724365],
       [ 1.22905719,  1.        ,  1.74422655,  1.37605151],
       [ 2.36542472,  1.74422655,  1.        ,  1.70724365],
       [ 1.70724365,  1.37605151,  1.70724365,  1.        ]])
>>> 

continuing… 继续...

>>> # the cache is serializing the result object behind the scenes
>>> # it also works if we directly pickle and unpickle it
>>> _d = dill.loads(dill.dumps(d))
>>> _d
<scipy.interpolate.rbf.Rbf object at 0x104245510>
>>> _d.A
array([[ 1.        ,  1.22905719,  2.36542472,  1.70724365],
       [ 1.22905719,  1.        ,  1.74422655,  1.37605151],
       [ 2.36542472,  1.74422655,  1.        ,  1.70724365],
       [ 1.70724365,  1.37605151,  1.70724365,  1.        ]])
>>>

Get klepto and dill here: https://github.com/uqfoundation 获取kleptodillhttps//github.com/uqfoundation

Alright, Mike's solution seems to be a good one, but I found another in the meantime: 好吧,迈克的解决方案似乎很好,但在此期间我找到了另一个:

There are only two parts of an Rbf object that can't be pickled directly, and they are easy to recreate from scratch. Rbf对象只有两个部分无法直接进行pickle,并且很容易从头开始重新创建。 Therefore my code now saves only the data parts: 因此我的代码现在只保存数据部分:

import scipy.interpolate as inter
import numpy as np
import cPickle

x = np.array([[1,2,3],[3,4,5],[7,8,9],[1,5,9]])
y = np.array([1,2,3,4])

rbfi = inter.Rbf(x[:,0], x[:,1], x[:,2], y)

RBFfile = open('picklefile','wb')
RBFpickler = cPickle.Pickler(RBFfile,protocol=2)

# RBF can't be pickled directly, so save everything required for reconstruction
RBFdict = {}            
for key in rbfi.__dict__.keys():
    if key != '_function' and key!= 'norm':
        RBFdict[key] = rbfi.__getattribute__(key)   

RBFpickler.dump(RBFdict)
RBFfile.close()

This gives me a file containing all the information stored in the object. 这给了我一个包含存储在对象中的所有信息的文件。 rbfi._function() and rbfi.norm are not saved. rbfi._function()rbfi.norm未保存。 Luckily, they can be recreated from scratch by just initializing any (arbitrarily simple) Rbf object: 幸运的是,它们可以通过初始化任何(任意简单的)Rbf对象从头开始重新创建:

## create a bare-bones RBF object ##
rbfi = inter.Rbf(np.array([1,2,3]), np.array([10,20,30]), \
                      np.array([1,2,3]), function = RBFdict['function'] )

This object's data parts are then replaced with the saved data: 然后,用保存的数据替换此对象的数据部分:

RBFfile = open('picklefile','rb')
RBFunpickler = cPickle.Unpickler(RBFfile)
RBFdict = RBFunpickler.load()
RBFfile.close()

## replace rbfi's contents with what was saved ##
for key,value in RBFdict.iteritems():
    rbfi.__setattr__(key, value)

>>> rbfi(2,3,4)
array(1.4600661386382146)

It's apparently not even necessary to give the new Rbf object the same number of dimensions as the original one, as all of that will be overwritten. 显然甚至不需要为新的Rbf对象提供与原始对象相同数量的维度,因为所有这些都将被覆盖。

That said, Mike's solution is probably the more universally applicable one, while this one is more platform-independent. 也就是说,Mike的解决方案可能是更普遍适用的解决方案,而这个解决方案更加平台无关。 I've had issues with moving pickled Kriging models between platforms, but this method for RBF models seems to be more robust -- I haven't tested it much yet, though, so no guarantees given. 我在平台之间移动酸洗Kriging模型时遇到了问题,但这种RBF模型的方法看起来更加强大 - 我还没有测试它,但是没有给出保证。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM