对 pandas groupby 中的列进行操作

Question

Assume I have a dataframe df which has 4 columns col = ["id","date","basket","gender"] and a function假设我有一个 dataframe df ，它有 4 列col = ["id","date","basket","gender"]和一个 function

def is_valid_date(df):
         idx = some_scalar_function(df["basket") #returns an index
         date = df["date"].values[idx]
         return (date>some_date)

I have always understood the groupby as a "creation of a new dataframe" when splitting in the "split-apply-combine" (losely speaking) thus if I want to apply is_valid_date to each group of id , I would assume I could do在拆分“split-apply-combine”时，我一直将groupby理解为“创建一个新的数据帧”（很遗憾地说），因此如果我想将is_valid_date应用于每组id ，我会假设我可以做到

df.groupby("id").agg(get_first_date)

but it throws KeyError: 'basket' in the idx=some_scalar_function(df["basket"])但它在idx=some_scalar_function(df["basket"])中抛出KeyError: 'basket'

Answer 1

If use GroupBy.agg it working with each column separately, so cannot selecting like df["basket"], df["date"] .如果使用GroupBy.agg它分别处理每一列，所以不能选择像df["basket"], df["date"] 。

Solution is use GroupBy.apply with your custom function:解决方案是将GroupBy.apply与您的自定义 function 一起使用：

df.groupby("id").apply(get_first_date)

对 pandas groupby 中的列进行操作

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-30 08:11:35

对 pandas groupby 中的列进行操作

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-30 08:11:35

解决方案1
1 已采纳 2020-06-30 08:11:35