[英]Pandas: How do you group data together by date and apply multiple functions to the grouped data?
In my code i have a pandas dataframe with a column for the day and a column called value. 在我的代码中,我有一个pandas数据框,其中有一天的列和称为value的列。 I would like to group the dataframe by day and find the minimum and maximum value for that day, average the min and max and then subtract that average from the value column in the dataframe.
我想按天对数据框进行分组,并找到当天的最小值和最大值,对最小值和最大值进行平均,然后从数据框的值列中减去该平均值。
The closest thing i have been able to do has been: 我最能做的是:
temp_max = var.groupby(['day']).max()
temp_min = var.groupby(['day']).min()
answer = var.groupby(['day'])['value'].apply(lambda x : x - (temp_max['value'] - temp_min['value']) / 2 )
input: 输入:
Unnamed: 0 hrs vt rt value
0 119899 1 2017-03-01 07:00:00 2017-03-01 06:00:00 67.910011
1 119900 2 2017-03-01 08:00:00 2017-03-01 06:00:00 52.970033
2 119901 3 2017-03-01 09:00:00 2017-03-01 06:00:00 49.010011
3 119902 4 2017-03-01 10:00:00 2017-03-01 06:00:00 47.030000
4 119903 5 2017-03-01 11:00:00 2017-03-01 06:00:00 45.949989
5 119904 6 2017-03-01 12:00:00 2017-03-01 06:00:00 45.949989
output: 输出:
1 0 NaN
1 41.540022
2 31.549989
3 29.570005
4 36.949989
5 38.030000
6 40.010011
7 33.980000
8 47.030000
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
2 1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
17 NaN
18 NaN
19 NaN
20 NaN
21 NaN
...
6 4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
53 NaN
54 NaN
55 NaN
56 NaN
7 1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
57 NaN
58 NaN
59 NaN
60 NaN
8 1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
61 NaN
The values appear to be correct but i was hoping to keep my original dataframe and just update the values in place. 该值似乎是正确的,但我希望保留原始数据框,并仅将这些值更新到位。 Is there a different way i should be approaching this?
我应该采用其他方法吗? Thx in advance!
提前谢谢!
How about something like this? 这样的事情怎么样?
new_frame = pd.DataFrame(columns=var.columns)
for day,frame in var.groupby('day'):
frame.loc[:,'value'] = frame['value'].apply(lambda x: x - (frame.value.max() + frame.value.min())/2)
new_frame = new_frame.append(frame)
You could do it in one line using a list comprehension and groupby but it looks a bit ugly 您可以使用列表理解和groupby在一行中完成此操作,但是看起来有点难看
var.loc[:,'value'] = pd.concat([frm.value.apply(lambda x:x-(frm.value.min() + frm.value.max())/2) for d,frm in var.groupby('day')])
I believe that would accomplish what you're trying to do, albeit not being particularly readable! 我相信,即使不是特别易读,它也可以完成您想要的工作!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.