简体   繁体   English

分组大熊猫数据帧的平均值

[英]Mean of a grouped-by pandas dataframe

I need to calculate the mean per day of the colums duration and km for the rows with value ==1 and values = 0. 我需要计算colums持续时间的每日平均值以及值为== 1且值为0的行的km。

df
Out[20]: 
                          Date duration km   value
0   2015-03-28 09:07:00.800001    0      0    0
1   2015-03-28 09:36:01.819998    1      2    1
2   2015-03-30 09:36:06.839997    1      3    1 
3   2015-03-30 09:37:27.659997    nan    5    0 
4   2015-04-22 09:51:40.440003    3      7    0
5   2015-04-23 10:15:25.080002    0      nan  1

how can I modify this solution in order to have the means duration_value0, duration_value1, km_value0 and km_value1? 如何修改此解决方案以获得duration_value0,duration_value1,km_value0和km_value1的含义?

df = df.set_index('Date').groupby(pd.Grouper(freq='d')).mean().dropna(how='all')
print (df)
            duration   km
Date                     
2015-03-28       0.5  1.0
2015-03-30       1.5  4.0
2015-04-22       3.0  7.0
2015-04-23       0.0  0.0

I believe doing a group by Date as well as value should do it. 我相信按Datevalue做一组应该这样做。 Call dfGroupBy.mean followed by df.reset_index to get your desired output: 呼叫dfGroupBy.mean其次df.reset_index得到您想要的输出:

In [713]: df.set_index('Date')\
           .groupby([pd.Grouper(freq='d'), 'value'])\
           .mean().reset_index(1, drop=True)
Out[713]: 
            duration   km
Date                     
2015-03-28       0.0  0.0
2015-03-28       1.0  2.0
2015-03-30       NaN  5.0
2015-03-30       1.0  3.0
2015-04-22       3.0  7.0
2015-04-23       0.0  NaN

I think you are looking pivot table ie 我认为你正在寻找枢轴表即

df.pivot_table(values=['duration','km'],columns=['value'],index=df['Date'].dt.date,aggfunc='mean')

Output: 输出:

duration        km     
value             0    1    0    1
Date                              
2015-03-28      0.0  1.0  0.0  2.0
2015-03-30      NaN  1.0  5.0  3.0
2015-04-22      3.0  NaN  7.0  NaN
2015-04-23      NaN  0.0  NaN  NaN
In [24]:

If you want the new column names like distance0,distance1 ... You can use list comprehension ie if you store the pivot table in ndf 如果你想要新的列名如distance0,distance1 ......你可以使用列表理解,即如果你将数据透视表存储在ndf

ndf.columns = [i[0]+str(i[1]) for i in ndf.columns]

Output: 输出:

duration0  duration1  km0  km1
Date                                      
2015-03-28        0.0        1.0  0.0  2.0
2015-03-30        NaN        1.0  5.0  3.0
2015-04-22        3.0        NaN  7.0  NaN
2015-04-23        NaN        0.0  NaN  NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM