多處理到python函數

Question

我如何實現對函數的多重處理。我嘗試了這種方法，但是沒有用。

def steric_clashes_parallel(system):
    rna_st = system[MolWithResID("G")].molecule()
    for i in system.molNums():
        peg_st = system[i].molecule()
        if rna_st != peg_st:
            print(peg_st)
            for i in rna_st.atoms(AtomIdx()):
                for j in peg_st.atoms(AtomIdx()):
#                    print(Vector.distance(i.evaluate().center(), j.evaluate().center()))
                    dist = Vector.distance(i.evaluate().center(), j.evaluate().center())
                    if dist<2:
                        return print("there is a steric clash")
    return print("there is no steric clashes")  

mix = PDB().read("clash_1.pdb")
system = System()
system.add(mix)    
from multiprocessing import Pool
p = Pool(4)
p.map(steric_clashes_parallel,system)

我有一千個pdb或系統文件要通過此功能進行測試。 沒有多處理模塊的單個內核上的一個文件花了2個小時。 任何建議都會有很大幫助。

我的回溯看起來像這樣：

self.run()
File "/home/sajid/sire.app/bundled/lib/python3.3/threading.py", line 858,
  in run self._target(*self._args, **self._kwargs)
    File "/home/sajid/sire.app/bundled/lib/python3.3/multiprocessing/pool.py", line 351,
      in _handle_tasks put(task)
        File "/home/sajid/sire.app/bundled/lib/python3.3/multiprocessing/connection.py", line 206,
          in send ForkingPickler(buf, pickle.HIGHEST_PROTOCOL).dump(obj)
RuntimeError: Pickling of "Sire.System._System.System" instances is not enabled
(boost.org/libs/python/doc/v2/pickle.html)

Answer 1

問題在於Sire.System._System.System無法序列化，因此無法發送給子進程。 多重處理使用pickle模塊進行序列化，您可以經常在主程序中使用pickle.dumps(my_mp_object)進行完整性檢查以進行驗證。

但是，您還有另一個問題（或者，我認為是基於變量名的）。 map方法采用一個可迭代對象，並將其迭代對象擴展為池成員，但似乎您要處理system本身，而不是迭代過程。

多重處理的一個技巧是保持從父級發送給子級的有效負載簡單，並讓子級承擔創建其對象的繁重工作。 在這里，您最好發送下來的文件名，讓孩子們完成大部分工作。

def steric_clashes_from_file(filename):
    mix = PDB().read(filename)
    system = System()
    system.add(mix)    
    steric_clashes_parallel(system)

def steric_clashes_parallel(system):
    rna_st = system[MolWithResID("G")].molecule()
    for i in system.molNums():
        peg_st = system[i].molecule()
        if rna_st != peg_st:
            print(peg_st)
            for i in rna_st.atoms(AtomIdx()):
                for j in peg_st.atoms(AtomIdx()):
#                    print(Vector.distance(i.evaluate().center(), j.evaluate().center()))
                    dist = Vector.distance(i.evaluate().center(), j.evaluate().center())
                    if dist<2:
                        return print("there is a steric clash")
    return print("there is no steric clashes")  

filenames = ["clash_1.pdb",]
from multiprocessing import Pool
p = Pool(4, chunksize=1)
p.map(steric_clashes_from_file,filenames)

Answer 2

@ martineau：我測試了泡菜命令，它給了我；

 ----> 1 pickle.dumps(clash_1.pdb)
    RuntimeError: Pickling of "Sire.Mol._Mol.MoleculeGroup" instances is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)
    ----> 1 pickle.dumps(system)
    RuntimeError: Pickling of "Sire.System._System.System" instances is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)

使用您的腳本花費了相同的時間，並且只使用了一個內核。 dist線是可迭代的。 我可以在多核上運行這一行嗎？ 我將該行修改為；

for i in rna_st.atoms(AtomIdx()):
                    icent = i.evaluate().center()
                    for j in peg_st.atoms(AtomIdx()):
                        dist = Vector.distance(icent, j.evaluate().center())

Answer 3

您可以采取一種技巧來更快地計算每個文件-依次處理每個文件，但並行處理文件的內容。 這取決於許多警告：

您正在運行可以派生進程的系統（例如Linux）。
您正在進行的計算不會產生影響將來計算結果的副作用。

看來您的情況就是這種情況，但我不能百分百確定。

派生一個進程時，子進程中的所有內存都將從父進程中復制（更重要的是，它以一種有效的方式進行了復制-僅從中讀取的內存不會重復）。 這使得在進程之間共享大而復雜的初始狀態變得容易。 但是，一旦子進程啟動，它們將不會看到在父進程中對對象所做的任何更改（反之亦然）。

樣例代碼：

import multiprocessing

system = None
rna_st = None

class StericClash(Exception):
    """Exception used to halt processing of a file. Could be modified to 
    include information about what caused the clash if this is useful."""
    pass


def steric_clashes_parallel(system_index):
    peg_st = system[system_index].molecule()
    if rna_st != peg_st:
        for i in rna_st.atoms(AtomIdx()):
            for j in peg_st.atoms(AtomIdx()):
                dist = Vector.distance(i.evaluate().center(), 
                    j.evaluate().center())
                if dist < 2:
                    raise StericClash()


def process_file(filename):
    global system, rna_st

    # initialise global values before creating pool     
    mix = PDB().read(filename)
    system = System()
    system.add(mix)
    rna_st = system[MolWithResID("G")].molecule()

    with multiprocessing.Pool() as pool:
        # contents of file processed in parallel
        try:
            pool.map(steric_clashes_parallel, range(system.molNums()))
        except StericClash:
            # terminate called to halt current jobs and further processing 
            # of file
            pool.terminate()
            # wait for pool processes to terminate before returning
            pool.join()
            return False
        else:
            pool.close()
            pool.join()
            return True
        finally:
            # reset globals
            system = rna_st = None

if __name__ == "__main__":
    for filename in get_files_to_be_processed():
        # files are being processed in serial
        result = process_file(filename)
        save_result_to_disk(filename, result)

多處理到python函數

問題描述

3 個解決方案

解決方案1
3 2014-12-22 19:54:02

解決方案2
0 2014-12-22 22:48:53

解決方案3
0 2014-12-22 23:00:41

多處理到python函數

問題描述

3 個解決方案

解決方案1 3 2014-12-22 19:54:02

解決方案2 0 2014-12-22 22:48:53

解決方案3 0 2014-12-22 23:00:41

解決方案1
3 2014-12-22 19:54:02

解決方案2
0 2014-12-22 22:48:53

解決方案3
0 2014-12-22 23:00:41