Dask Groupby-Apply meta failed

Question

I have a groupby that is working for me without using the meta argument. It outputs what I want but I would like to add column names and get a dataframe instead of a Series as an output.

I am this trying to run the following code:

jmin = client.persist(j1.loc[:10000])

import pandas as pd
import numpy as np

def unique(d):
    return len(d.loc[:,['id']].drop_duplicates())

meta=pd.DataFrame(columns=['ids_per_mac'])
meta.ids_per_mac.astype(np.int64)

uu = client.persist(jmin.groupby(jmin['mac_address']).apply(unique,meta=meta))

The execution fails:

Any idea why the this groupby apply is not working vs the version without meta?

uu = client.persist(jmin.groupby(jmin['mac_address']).apply(unique))

Answer 1

Adding meta= tells dask.dataframe what it should expect from your function. It is a way to be polite to dask.dataframe so that it can continue to operate lazily without having to call your code to determine what your function returns.

Unfortunately it is not a way to convert your data automatically. You will still have to rely on normal Pandas API for this.

You might consider the .to_frame() method to convert a Series into a DataFrame.

You are probably also aware, but calling groupby.apply is much slower than using a builtin reduction like groupby.nunique or groupby.aggregate .

Dask Groupby-Apply meta failed

Question

1 answers

solution1
4 ACCPTED 2017-04-07 22:13:53

Dask Groupby-Apply meta failed

Question

1 answers

solution1 4 ACCPTED 2017-04-07 22:13:53

solution1
4 ACCPTED 2017-04-07 22:13:53