I have a groupby that is working for me without using the meta argument. It outputs what I want but I would like to add column names and get a dataframe instead of a Series as an output.
I am this trying to run the following code:
jmin = client.persist(j1.loc[:10000])
import pandas as pd
import numpy as np
def unique(d):
return len(d.loc[:,['id']].drop_duplicates())
meta=pd.DataFrame(columns=['ids_per_mac'])
meta.ids_per_mac.astype(np.int64)
uu = client.persist(jmin.groupby(jmin['mac_address']).apply(unique,meta=meta))
The execution fails:
Any idea why the this groupby apply is not working vs the version without meta?
uu = client.persist(jmin.groupby(jmin['mac_address']).apply(unique))
Adding meta=
tells dask.dataframe what it should expect from your function. It is a way to be polite to dask.dataframe so that it can continue to operate lazily without having to call your code to determine what your function returns.
Unfortunately it is not a way to convert your data automatically. You will still have to rely on normal Pandas API for this.
You might consider the .to_frame()
method to convert a Series into a DataFrame.
You are probably also aware, but calling groupby.apply is much slower than using a builtin reduction like groupby.nunique
or groupby.aggregate
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.