使用带有pandas数据帧的多个lambda函数

Question

I have a pd data frame in which the column called "process_id" has, for multiple time steps, different parameters associated with it. 我有一个pd数据框，其中名为“process_id”的列在多个时间步骤中具有与之关联的不同参数。 I want to extract several information from these and put them into a new data frame (so I don't have to use all the details of the data). 我想从这些中提取几个信息并将它们放入一个新的数据框中（因此我不必使用数据的所有细节）。 Below is an example of what I mean, where I keep, for each "process_id" the min, max, mean and std of each parameter and I also define a lambda function to save the mean of the parameters in the last 5 timesteps: 下面是我的意思，我保留的例子，每个“process_id”每个参数的最小值，最大值，平均值和标准值，我还定义了一个lambda函数来保存最后5个步骤中参数的平均值：

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean()])

This works fine and the lambda function changes the name of the parameter in the table to something like this: "parameter_lambda" (not sure how, but it works). 这工作正常，lambda函数将表中参数的名称更改为：“parameter_lambda”（不确定如何，但它的工作原理）。 Now the problem is that if I want to add another lambda function, something like this (or any other lambda definition): 现在的问题是，如果我想添加另一个lambda函数，像这样（或任何其他lambda定义）：

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda x: x.iloc[0:int(len(df)/5)].mean()])

I get this error: 我收到此错误：

Function names must be unique, found multiple named 函数名必须唯一，找到多个命名

Which makes sense, as both lambda functions will have the same name in the data frame. 这是有道理的，因为两个lambda函数在数据框中具有相同的名称。 But I don't know how to get around this. 但我不知道如何解决这个问题。

I tried something like this: 我试过这样的事情：

df.groupby('dummy').agg({'returns':{'Mean': np.mean, 'Sum': np.sum}})

as described here , but I am getting this error: 描述在这里，但我得到这个错误：

SpecificationError: cannot perform renaming for returns with a nested dictionary SpecificationError：无法使用嵌套字典对返回执行重命名

Can someone help me? 有人能帮我吗？ Thank you! 谢谢！

Answer 1

lambda function will have the problem with duplicate name errors when there are more than one para created by lambda 当lambda创建多个para时， lambda函数会出现重复名称错误的问题

fuc1=lambda x: x.tail(5).mean()
fuc1.__name__ = 'tail_mean'

fuc2=lambda x: x.iloc[0:int(len(df)/5)].mean()
fuc2.__name__ = 'len_mean'

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', fuc1,fuc2])

Answer 2

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda y: y.iloc[0:int(len(df)/5)].mean()])

Try with x and y instead of x and x 尝试使用x和y而不是x和x

df.groupby('dummy').agg({'returns': [np.mean, np.sum]})

Also, try this 另外，试试这个

使用带有pandas数据帧的多个lambda函数

问题描述

2 个解决方案

解决方案1
4 已采纳 2019-02-10 19:21:08

解决方案2
0 2019-02-10 19:16:59

使用带有pandas数据帧的多个lambda函数

问题描述

2 个解决方案

解决方案1 4 已采纳 2019-02-10 19:21:08

解决方案2 0 2019-02-10 19:16:59

解决方案1
4 已采纳 2019-02-10 19:21:08

解决方案2
0 2019-02-10 19:16:59