[英]Filtering and selecting dataframe in pandas using outcome of .groupby
Hello Python community,你好 Python 社区,
I have a "little" Python/Pandas problem and would be very happy if someone could help me with it on short notice.我有一个“小”Python/Pandas 问题,如果有人能在短时间内帮助我解决它,我会非常高兴。 I have a dataframe with 2 IDs, date, hour of day and several metrics like in this example:
我有一个包含 2 个 ID、日期、一天中的小时和几个指标的数据框,如本例所示:
index![]() |
ID_1 ![]() |
ID_2 ![]() |
date![]() |
hour![]() |
metric_1 ![]() |
metric_2 ![]() |
metric_3 ![]() |
… ![]() |
metric_k ![]() |
---|---|---|---|---|---|---|---|---|---|
0 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-09 ![]() |
0 ![]() |
11 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
1 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-09 ![]() |
1 ![]() |
8 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
2 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-09 ![]() |
2 ![]() |
7 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
3 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-09 ![]() |
3 ![]() |
0 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
… ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-09 ![]() |
0 ![]() |
0 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
22 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-09 ![]() |
22 ![]() |
17 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
23 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-09 ![]() |
23 ![]() |
11 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
24 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-10 ![]() |
0 ![]() |
9 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
25 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-10 ![]() |
1 ![]() |
3 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
… ![]() |
… ![]() |
… ![]() |
… ![]() |
… ![]() |
… ![]() |
… ![]() |
… ![]() |
… ![]() |
… ![]() |
n ![]() |
K510325 ![]() |
105983-20 ![]() |
2021-05-23 ![]() |
0 ![]() |
5 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
n+1 ![]() |
K510325 ![]() |
105983-20 ![]() |
2021-05-23 ![]() |
1 ![]() |
1 ![]() |
any value![]() |
any value![]() |
… ![]() |
any value![]() |
For each metric, one value is determined per device per day per hour.对于每个指标,每个设备每天每小时确定一个值。 A device is made unique with 2 IDs because one ID is not unique.
一个设备具有 2 个 ID,因为一个 ID 不是唯一的。 Now I want to know per device and per day at which hour eg metric_1 reaches the maximum value to see a distribution of hours with maximum values.
现在我想知道每个设备和每天哪个小时,例如 metric_1 达到最大值以查看具有最大值的小时数分布。
Using df[['ID_1', 'ID_2', 'date', 'metric_1']].groupby(['ID_1', 'ID_2', 'date'], as_index=False).max()
I do get the maximum value of the day for a device displayed:使用
df[['ID_1', 'ID_2', 'date', 'metric_1']].groupby(['ID_1', 'ID_2', 'date'], as_index=False).max()
我确实得到了显示的设备当天的最大值:
ID_1 ![]() |
ID_2 ![]() |
date![]() |
metric_1 ![]() |
|
---|---|---|---|---|
index![]() |
||||
0 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-09 ![]() |
17 ![]() |
1 ![]() |
A101321 ![]() |
25459379 ![]() |
2021-05-10 ![]() |
9 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
m![]() |
K510325 ![]() |
105983-20 ![]() |
2021-05-23 ![]() |
5 ![]() |
but I can't see at what hour and all attempts to achieve this have failed miserably so far... Can someone please help me with this?但我看不到什么时候,到目前为止,所有实现这一目标的尝试都失败了......有人可以帮我吗?
If you take your groupby, which seems to produce the results you want, instead use如果您使用 groupby,这似乎产生了您想要的结果,请改用
df[['ID_1', 'ID_2', 'date', 'metric_1']].groupby(['ID_1', 'ID_2', 'date']).idxmax()
This gets the indices for those maximum values.这将获取这些最大值的索引。 Then subset in your original data frame for those specific indices using
df.loc[indicies, list_of_columns]
, if the result of the above code block is assigned to indicies
.然后在子集使用那些特定索引的原始数据帧
df.loc[indicies, list_of_columns]
如果上述代码块的结果被分配给indicies
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.