[英]Group By a Column and Sum contents of another column with Python
我有一个数据merged_df_energy
:
+------------------------+------------------------+------------------------+--------------+
| ACT_TIME_AERATEUR_1_F1 | ACT_TIME_AERATEUR_1_F3 | ACT_TIME_AERATEUR_1_F5 | class_energy |
+------------------------+------------------------+------------------------+--------------+
| 63.333333 | 63.333333 | 63.333333 | low |
| 0 | 0 | 0 | high |
| 45.67 | 0 | 55.94 | high |
| 0 | 0 | 23.99 | low |
| 0 | 20 | 23.99 | medium |
+------------------------+------------------------+------------------------+--------------+
我想为每个ACT_TIME_AERATEUR_1_Fx
( ACT_TIME_AERATEUR_1_F1
、 ACT_TIME_AERATEUR_1_F3
和ACT_TIME_AERATEUR_1_F5
)创建一个包含以下列的数据class_energy
: class_energy
和sum_time
例如对于对应于ACT_TIME_AERATEUR_1_F1
的数据帧:
+-----------------+-----------+
| class_energy | sum_time |
+-----------------+-----------+
| low | 63.333333 |
| medium | 0 |
| high | 45.67 |
+-----------------+-----------+
我要做的事情我应该像这样使用该组:
data.groupby(by=['class_energy'])['sum_time'].sum()
我该怎么做?
您可以将所有列添加到[]
以进行聚合:
print (df.groupby(by=['class_energy'])['ACT_TIME_AERATEUR_1_F1', 'ACT_TIME_AERATEUR_1_F3','ACT_TIME_AERATEUR_1_F5'].sum())
ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 \
class_energy
high 45.670000 0.000000
low 63.333333 63.333333
medium 0.000000 20.000000
ACT_TIME_AERATEUR_1_F5
class_energy
high 55.940000
low 87.323333
medium 23.990000
您还可以使用参数as_index=False
:
print (df.groupby(by=['class_energy'], as_index=False)['ACT_TIME_AERATEUR_1_F1', 'ACT_TIME_AERATEUR_1_F3','ACT_TIME_AERATEUR_1_F5'].sum())
class_energy ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 \
0 high 45.670000 0.000000
1 low 63.333333 63.333333
2 medium 0.000000 20.000000
ACT_TIME_AERATEUR_1_F5
0 55.940000
1 87.323333
2 23.990000
如果只需要聚合前3
列:
print (df.groupby(by=['class_energy'], as_index=False)[df.columns[:3]].sum())
class_energy ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 \
0 high 45.670000 0.000000
1 low 63.333333 63.333333
2 medium 0.000000 20.000000
ACT_TIME_AERATEUR_1_F5
0 55.940000
1 87.323333
2 23.990000
...或没有最后一个的所有列:
print (df.groupby(by=['class_energy'], as_index=False)[df.columns[:-1]].sum())
class_energy ACT_TIME_AERATEUR_1_F1 ACT_TIME_AERATEUR_1_F3 \
0 high 45.670000 0.000000
1 low 63.333333 63.333333
2 medium 0.000000 20.000000
ACT_TIME_AERATEUR_1_F5
0 55.940000
1 87.323333
2 23.990000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.