簡體   English   中英

發行是如何工作的?

[英]How does distribution works in dask?

我有一個數據框:

import numpy as np
import pandas as pd
import dask.dataframe as dd
a = {'b':['cat','bat','cat','cat','bat','No Data','bat','No Data'],
     'c':['str1','str2','str3', 'str4','str5','str6','str7', 'str8']
    }
df11 = pd.DataFrame(a,index=['x1','x2','x3','x4','x5','x6','x7','x8'])

我嘗試使用lamda函數提取基於行和正常數據幀的每個元素,如下所示:

def elementsearch(term1, term2):
    print(term1, term2 )
    return term1

df11.apply(lambda x: elementsearch(x.b,x.c), axis =1)

一切正常。 但是當我使用dask庫時:

ddf = dd.from_pandas(df11,npartitions=8)
ddf.map_partitions(lambda df : df.apply(lambda x : elementsearch((x.b,x.c),axis=1)))

它引發了如下錯誤:

ValueError: Metadata inference failed in `lambda`.

You have supplied a custom function and Dask is unable to 
determine the type of output that that function returns. 

To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.

Original error is below:
------------------------
AttributeError("'Series' object has no attribute 'c'", 'occurred at index b')

Traceback:
---------
  File "/opt/conda/lib/python3.6/site-packages/dask/dataframe/utils.py", line 137, in raise_on_meta_error
    yield
  File "/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py", line 3477, in _emulate
    return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
  File "<ipython-input-198-8857a48ba1e5>", line 2, in <lambda>
    ddf.map_partitions(lambda df : df.apply(lambda x : elementsearch((x.b,x.c),axis=1)))
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py", line 6014, in apply
    return op.get_result()
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/apply.py", line 318, in get_result
    return super(FrameRowApply, self).get_result()
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/apply.py", line 142, in get_result
    return self.apply_standard()
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "<ipython-input-198-8857a48ba1e5>", line 2, in <lambda>
    ddf.map_partitions(lambda df : df.apply(lambda x : elementsearch((x.b,x.c),axis=1)))
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 4376, in __getattr__
    return object.__getattribute__(self, name)

我已經在堆棧超載中提到了這個問題,但是它對我不起作用: 在Dask DataFrame.apply()上,在處理實際行之前接收n值為1的行

我該如何解決?

我建議像對Pandas代碼一樣,在dask數據幀上僅使用apply方法

df11.apply(lambda x: elementsearch(x.b,x.c), axis =1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM