简体   繁体   English

在另一个 dask 数据帧上使用 apply 函数将行附加到一个 dask 数据帧

[英]Append rows to a dask dataframe using apply function on another dask dataframe

I want to run the following operation using dask.我想使用 dask 运行以下操作。

df1 = pd.DataFrame()

def foo(row):
    global df1
    df1.append(row)

def main():
    global df1
    df2.apply(foo , axis = 1)

When I run the following operation without Dask , it runs perfectly fine, but when I convert both of my dataframes to dask, then I do not get any data in my df2 dataframe after computing.当我在没有 Dask 的情况下运行以下操作时,它运行得非常好,但是当我将两个数据帧都转换为 dask 时,计算后我的 df2 数据帧中没有任何数据。

df1 = pd.DataFrame()
df1 = from_pandas(df1, npartitions=10)

def foo(row):
    global df1
    df1.append(row)

def main():
    global df1
    df2 = from_pandas(df2 , npartitions = 10)
    df2.apply(foo , axis = 1, meta = df2)

I am not sure, what I am doing wrong, or if there is a better way for this in dask, my main objective is to run the entire code using dask, since the data is quite large to work on.我不确定,我做错了什么,或者在 dask 中是否有更好的方法,我的主要目标是使用 dask 运行整个代码,因为数据量很大。

The result of .apply method should be assigned to a new dask DataFrame: .apply方法的结果应该分配给一个新的dask DataFrame:

df2 = df2.apply(foo , axis = 1, meta = df2)

However, it's likely that this is not efficient when working with data at scale.但是,在处理大规模数据时,这可能效率不高。 What is more efficient will depend on the specific problem being solved.什么更有效将取决于要解决的具体问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM