简体   繁体   English

使用 .groupby 的结果在 Pandas 中过滤和选择数据框

[英]Filtering and selecting dataframe in pandas using outcome of .groupby

Hello Python community,你好 Python 社区,

I have a "little" Python/Pandas problem and would be very happy if someone could help me with it on short notice.我有一个“小”Python/Pandas 问题,如果有人能在短时间内帮助我解决它,我会非常高兴。 I have a dataframe with 2 IDs, date, hour of day and several metrics like in this example:我有一个包含 2 个 ID、日期、一天中的小时和几个指标的数据框,如本例所示:

index指数 ID_1 ID_1 ID_2 ID_2 date日期 hour小时 metric_1 metric_1 metric_2 metric_2 metric_3 metric_3 metric_k metric_k
0 0 A101321 A101321 25459379 25459379 2021-05-09 2021-05-09 0 0 11 11 any value任何值 any value任何值 any value任何值
1 1 A101321 A101321 25459379 25459379 2021-05-09 2021-05-09 1 1 8 8 any value任何值 any value任何值 any value任何值
2 2 A101321 A101321 25459379 25459379 2021-05-09 2021-05-09 2 2 7 7 any value任何值 any value任何值 any value任何值
3 3 A101321 A101321 25459379 25459379 2021-05-09 2021-05-09 3 3 0 0 any value任何值 any value任何值 any value任何值
A101321 A101321 25459379 25459379 2021-05-09 2021-05-09 0 0 0 0 any value任何值 any value任何值 any value任何值
22 22 A101321 A101321 25459379 25459379 2021-05-09 2021-05-09 22 22 17 17 any value任何值 any value任何值 any value任何值
23 23 A101321 A101321 25459379 25459379 2021-05-09 2021-05-09 23 23 11 11 any value任何值 any value任何值 any value任何值
24 24 A101321 A101321 25459379 25459379 2021-05-10 2021-05-10 0 0 9 9 any value任何值 any value任何值 any value任何值
25 25 A101321 A101321 25459379 25459379 2021-05-10 2021-05-10 1 1 3 3 any value任何值 any value任何值 any value任何值
n n K510325 K510325 105983-20 105983-20 2021-05-23 2021-05-23 0 0 5 5 any value任何值 any value任何值 any value任何值
n+1 n+1 K510325 K510325 105983-20 105983-20 2021-05-23 2021-05-23 1 1 1 1 any value任何值 any value任何值 any value任何值

For each metric, one value is determined per device per day per hour.对于每个指标,每个设备每天每小时确定一个值。 A device is made unique with 2 IDs because one ID is not unique.一个设备具有 2 个 ID,因为一个 ID 不是唯一的。 Now I want to know per device and per day at which hour eg metric_1 reaches the maximum value to see a distribution of hours with maximum values.现在我想知道每个设备和每天哪个小时,例如 metric_1 达到最大值以查看具有最大值的小时数分布。

Using df[['ID_1', 'ID_2', 'date', 'metric_1']].groupby(['ID_1', 'ID_2', 'date'], as_index=False).max() I do get the maximum value of the day for a device displayed:使用df[['ID_1', 'ID_2', 'date', 'metric_1']].groupby(['ID_1', 'ID_2', 'date'], as_index=False).max()我确实得到了显示的设备当天的最大值:

ID_1 ID_1 ID_2 ID_2 date日期 metric_1 metric_1
index指数
0 0 A101321 A101321 25459379 25459379 2021-05-09 2021-05-09 17 17
1 1 A101321 A101321 25459379 25459379 2021-05-10 2021-05-10 9 9
... ... ... ... ... ... ... ... ... ...
m K510325 K510325 105983-20 105983-20 2021-05-23 2021-05-23 5 5

but I can't see at what hour and all attempts to achieve this have failed miserably so far... Can someone please help me with this?但我看不到什么时候,到目前为止,所有实现这一目标的尝试都失败了......有人可以帮我吗?

If you take your groupby, which seems to produce the results you want, instead use如果您使用 groupby,这似乎产生了您想要的结果,请改用

df[['ID_1', 'ID_2', 'date', 'metric_1']].groupby(['ID_1', 'ID_2', 'date']).idxmax()

This gets the indices for those maximum values.这将获取这些最大值的索引。 Then subset in your original data frame for those specific indices using df.loc[indicies, list_of_columns] , if the result of the above code block is assigned to indicies .然后在子集使用那些特定索引的原始数据帧df.loc[indicies, list_of_columns]如果上述代码块的结果被分配给indicies

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM