简体   繁体   中英

Dask Groupby-Apply meta failed

I have a groupby that is working for me without using the meta argument. It outputs what I want but I would like to add column names and get a dataframe instead of a Series as an output.

I am this trying to run the following code:

jmin = client.persist(j1.loc[:10000])

import pandas as pd
import numpy as np

def unique(d):
    return len(d.loc[:,['id']].drop_duplicates())

meta=pd.DataFrame(columns=['ids_per_mac'])
meta.ids_per_mac.astype(np.int64)

uu = client.persist(jmin.groupby(jmin['mac_address']).apply(unique,meta=meta))

The execution fails:

在此处输入图片说明

Any idea why the this groupby apply is not working vs the version without meta?

uu = client.persist(jmin.groupby(jmin['mac_address']).apply(unique))

Adding meta= tells dask.dataframe what it should expect from your function. It is a way to be polite to dask.dataframe so that it can continue to operate lazily without having to call your code to determine what your function returns.

Unfortunately it is not a way to convert your data automatically. You will still have to rely on normal Pandas API for this.

You might consider the .to_frame() method to convert a Series into a DataFrame.

You are probably also aware, but calling groupby.apply is much slower than using a builtin reduction like groupby.nunique or groupby.aggregate .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM