[英]Operate on columns in pandas groupby
Assume I have a dataframe df
which has 4 columns col = ["id","date","basket","gender"]
and a function假设我有一个 dataframe df
,它有 4 列col = ["id","date","basket","gender"]
和一个 function
def is_valid_date(df):
idx = some_scalar_function(df["basket") #returns an index
date = df["date"].values[idx]
return (date>some_date)
I have always understood the groupby
as a "creation of a new dataframe" when splitting in the "split-apply-combine" (losely speaking) thus if I want to apply is_valid_date
to each group of id
, I would assume I could do在拆分“split-apply-combine”时,我一直将groupby
理解为“创建一个新的数据帧”(很遗憾地说),因此如果我想将is_valid_date
应用于每组id
,我会假设我可以做到
df.groupby("id").agg(get_first_date)
but it throws KeyError: 'basket'
in the idx=some_scalar_function(df["basket"])
但它在idx=some_scalar_function(df["basket"])
中抛出KeyError: 'basket'
If use GroupBy.agg
it working with each column separately, so cannot selecting like df["basket"], df["date"]
.如果使用GroupBy.agg
它分别处理每一列,所以不能选择像df["basket"], df["date"]
。
Solution is use GroupBy.apply
with your custom function:解决方案是将GroupBy.apply
与您的自定义 function 一起使用:
df.groupby("id").apply(get_first_date)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.