[英]My dask code seems to be working in multithreading mode but fails in multiprocessing mode
當我嘗試使用 dask 進行 groupby 時,代碼失敗:
other_df = ddf.groupby(
by=[self.phone_field, self.state_field]).\
apply(lambda x: self.
obtain_cluster_nos_weighted_levenshtein(x.copy()),
meta={self.address_id_field: "f8",
self.add_clust_field: "i8"}).compute(scheduler='processes')
這是回溯:
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/dask/local.py", line 461, in fire_task
dumps((dsk[key], data)),
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 102, in dumps
cp.dump(obj)
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 437, in dump
self.save(obj)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
save(element)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 789, in save_tuple
save(element)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 745, in save_function
*self._dynamic_function_reduce(obj), obj=obj
File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 687, in _save_reduce_pickle5
save(state)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
save(element)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
self._batch_setitems(obj.items())
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
save(v)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
save(element)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 638, in save_reduce
save(args)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
save(element)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 549, in save
self.save_reduce(obj=obj, *rv)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
save(state)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
f(self, obj) # Call unbound method with explicit self
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
self._batch_setitems(obj.items())
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
save(v)
File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 524, in save
rv = reduce(self.proto)
NotImplementedError: object proxy must define __reduce_ex__()
我猜這與分配給工人之前的酸洗有關。 僅當 scheduler='processes' 時才會彈出此問題。 對於多線程,它執行得很好。 我該如何解決這個問題?
調用 class 方法進行多處理似乎不是一個好主意。 我將 self.obtain_cluster_nos_weighted_levenshtein 聲明為獨立的 function,我的問題就解決了。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.