简体   繁体   English

使用带有pandas数据帧的多个lambda函数

[英]Using multiple lambda functions with a pandas dataframe

I have a pd data frame in which the column called "process_id" has, for multiple time steps, different parameters associated with it. 我有一个pd数据框,其中名为“process_id”的列在多个时间步骤中具有与之关联的不同参数。 I want to extract several information from these and put them into a new data frame (so I don't have to use all the details of the data). 我想从这些中提取几个信息并将它们放入一个新的数据框中(因此我不必使用数据的所有细节)。 Below is an example of what I mean, where I keep, for each "process_id" the min, max, mean and std of each parameter and I also define a lambda function to save the mean of the parameters in the last 5 timesteps: 下面是我的意思,我保留的例子,每个“process_id”每个参数的最小值,最大值,平均值和标准值,我还定义了一个lambda函数来保存最后5个步骤中参数的平均值:

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean()])

This works fine and the lambda function changes the name of the parameter in the table to something like this: "parameter_lambda" (not sure how, but it works). 这工作正常,lambda函数将表中参数的名称更改为:“parameter_lambda”(不确定如何,但它的工作原理)。 Now the problem is that if I want to add another lambda function, something like this (or any other lambda definition): 现在的问题是,如果我想添加另一个lambda函数,像这样(或任何其他lambda定义):

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda x: x.iloc[0:int(len(df)/5)].mean()])

I get this error: 我收到此错误:

Function names must be unique, found multiple named 函数名必须唯一,找到多个命名

Which makes sense, as both lambda functions will have the same name in the data frame. 这是有道理的,因为两个lambda函数在数据框中具有相同的名称。 But I don't know how to get around this. 但我不知道如何解决这个问题。

I tried something like this: 我试过这样的事情:

df.groupby('dummy').agg({'returns':{'Mean': np.mean, 'Sum': np.sum}})

as described here , but I am getting this error: 描述在这里 ,但我得到这个错误:

SpecificationError: cannot perform renaming for returns with a nested dictionary SpecificationError:无法使用嵌套字典对返回执行重命名

Can someone help me? 有人能帮我吗? Thank you! 谢谢!

lambda function will have the problem with duplicate name errors when there are more than one para created by lambda lambda创建多个para时, lambda函数会出现重复名称错误的问题

fuc1=lambda x: x.tail(5).mean()
fuc1.__name__ = 'tail_mean'

fuc2=lambda x: x.iloc[0:int(len(df)/5)].mean()
fuc2.__name__ = 'len_mean'

features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', fuc1,fuc2])
features = df.groupby('process_id').agg(['min', 'max', 'mean', 'std', lambda x: x.tail(5).mean(),lambda y: y.iloc[0:int(len(df)/5)].mean()])

Try with x and y instead of x and x 尝试使用xy而不是xx

df.groupby('dummy').agg({'returns': [np.mean, np.sum]})

Also, try this 另外,试试这个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM