简体   繁体   English

Python:将类方法转移到另一台计算机

[英]Python: Transfer a class method to another computer

I have created an class that is used for analysising a specific type of data that I produce. 我创建了一个用于分析产生的特定类型数据的类。 I use this class on a local computer but occasionally there is too much data to work locally so I wanted to add an option to one of methods so that it can submit the job to a computer cluster. 我在本地计算机上使用该类,但偶尔有太多数据无法在本地工作,因此我想向其中一种方法添加一个选项,以便它可以将作业提交给计算机集群。 It mostly works except I am struggling to transfer a class method to the cluster. 除了我努力将类方法转移到集群之外,它通常都可以工作。

My class looks like this 我的课看起来像这样

class Analysis():
    def __init__(self, INPUT_PARAMETERS ETC):
        self.data
        OTHER_STUFF...
    @staticmethod
    def staticMethod1(input1, input2):
        # PERFORM SOME KIND OF CALCULATION ON INPUT1 AND INPUT2 AND RETURN THE RESULT
        return output

    @staticmethod
    def staticMethod2(input1, input2):
        # PERFORM SOME KIND OF CALCULATION ON INPUT1 AND INPUT2 AND RETURN THE RESULT
        return output

    # MORE STATIC METHODS

    @staticmethod
    def staticMethodN(input1, input2):
        # PERFORM SOME KIND OF CALCULATION ON INPUT1 AND INPUT2 AND RETURN THE RESULT
        return output

    def createArray(self, function):
        # CREATE AN ARRAY BY APPLYING FUNCTION TO SELF.DATA
        return array

So the createArray method gets called and the user passes the static method that should be used to calculate the array. 因此,将调用createArray方法,并且用户传递应该用于计算数组的静态方法。 When I wanted the array in createArray to be created on the cluster I saved the static method (that was passed to the this method eg staticMethod1 ) into a Pickle file using dill.dump . 当我想在阵列createArray群集上要创建我保存的静态方法(即传递给该方法例如staticMethod1 )到一个Pickle使用文件dill.dump The Pickle file is then passed to the cluster but when I try to load the method from the Pickle file it says ModuleNotFoundError: No module named 'analysis' which is the module that the Analysis class can be found in. 然后将Pickle文件传递到群集,但是当我尝试从Pickle文件加载方法时,它说ModuleNotFoundError: No module named 'analysis'的模块,该模块可以在Analysis类中找到。

Do I really need to recreate the whole class on the cluster just to use a static method? 我是否真的需要仅使用静态方法在群集上重新创建整个类? Can anyone suggest a elegant fix to this problem or suggest a better way of implementing this functionality? 谁能建议一个解决此问题的优雅方法,或者提出实现此功能的更好方法? It needs to work with any static method. 它需要使用任何静态方法。 FYI, one of the static methods uses from sklearn.metrics.cluster import adjusted_rand_score just incase that may effect a solution using dill . 仅供参考,静态方法之一是from sklearn.metrics.cluster import adjusted_rand_score使用from sklearn.metrics.cluster import adjusted_rand_score以防万一可能会影响使用dill的解决方案。

I'm the dill author. 我是dill作者。 dill is able to pass a class method to another computer, as seen below. dill能够将类方法传递给另一台计算机,如下所示。

>$ python
Python 3.5.6 (default, Sep 20 2018, 12:15:10) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo(object):
...   def bar(self, x):
...     return self.y + x
...   def __init__(self, y):
...     self.y = y
... 
>>> import dill
>>>          
>>> f = Foo(5)
>>>                  
>>> with open('foo.pkl', 'wb') as pkl:
...   dill.dump(f.bar, pkl)
... 
>>>

Then in a new session (or on another computer)... 然后在新会话中(或在另一台计算机上)...

>$ python
Python 3.5.6 (default, Sep 20 2018, 12:15:10) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('foo.pkl', 'rb') as pkl:
...   b = dill.load(pkl)
... 
>>> b(4)
9

Without more specific code from you, it's hard to say why you aren't seeing this behavior... but dill does provide the ability to pass a class definition (or just a class method) to another computer. 没有您提供的更具体的代码,很难说出为什么您没有看到这种行为...但是dill确实提供了将类定义(或只是类方法)传递给另一台计算机的功能。

This behavior is what enables code like pathos to pass the class method to another computer within a ParallelPool or a ProcessPool -- the latter is across processes, while the former can be across distributed resources. 这种行为可以使诸如pathos之类的代码将类方法传递给ParallelPoolProcessPool内的另一台计算机-后者跨进程,而前者跨分布式资源。

dude>$ python
Python 3.5.6 (default, Sep 20 2018, 12:15:10) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> class Foo(object):
...   def bar(self, x):
...     return self.y + x
...   def __init__(self, y):
...     self.y = y
... 
>>> import pathos
>>> p = pathos.pools.ParallelPool()
>>> p.map(Foo(4).bar, [1,2,3])
[5, 6, 7]
>>> p.close(); p.join()
>>>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM