I have a not so large dataframe (somewhere in 2000x10000
range in terms of shape).
I am trying to groupby
a columns, and average the first N non-null entries:
eg
def my_part_of_interest(v,N=42):
valid=v[~np.isnan(v)]
return np.mean(valid.values[0:N])
mydf.groupby('key').agg(my_part_of_interest)
It now take a long time (dozen of minutes), when .agg(np.nanmean)
was instead in order of seconds.
how to get it running faster?
Some things to consider:
mydf.dropna(subset=['v'], inplace=True)
mydf.groupby('key').apply(lambda x: x.head(42).agg('mean')
I think those combined can optimize things a bit and they are more idiomatic to pandas.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.