簡體   English   中英

我的 dask 代碼似乎在多線程模式下工作,但在多處理模式下失敗

[英]My dask code seems to be working in multithreading mode but fails in multiprocessing mode

當我嘗試使用 dask 進行 groupby 時,代碼失敗:

other_df = ddf.groupby(
            by=[self.phone_field, self.state_field]).\
                    apply(lambda x: self.
                        obtain_cluster_nos_weighted_levenshtein(x.copy()),
                            meta={self.address_id_field: "f8",
                                  self.add_clust_field: "i8"}).compute(scheduler='processes')

這是回溯:

  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/dask/local.py", line 461, in fire_task
    dumps((dsk[key], data)),
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 102, in dumps
    cp.dump(obj)
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 745, in save_function
    *self._dynamic_function_reduce(obj), obj=obj
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 687, in _save_reduce_pickle5
    save(state)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 638, in save_reduce
    save(args)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
NotImplementedError: object proxy must define __reduce_ex__()

我猜這與分配給工人之前的酸洗有關。 僅當 scheduler='processes' 時才會彈出此問題。 對於多線程,它執行得很好。 我該如何解決這個問題?

調用 class 方法進行多處理似乎不是一個好主意。 我將 self.obtain_cluster_nos_weighted_levenshtein 聲明為獨立的 function,我的問題就解決了。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM