简体   繁体   English

python:pandas:如何基于groupby另一列在列中查找最大值

[英]python: pandas: how to find max value in a column based on groupby another column

I want to group my dataframe based on one column SERVER and than find max value in other column JOB_ID.我想根据一列 SERVER 对我的数据框进行分组,然后在另一列 JOB_ID 中找到最大值。 DF: DF:

     SERVER   JOB_ID     LOG_FILE                 TIME
0    abc_123      1   1/abc_123/dep2/1/123.log  2019-12-05T05:06:16.346Z
1    abc_123     10  1/abc_123/dep2/10/123.log  2019-12-04T17:05:28.335Z
2    abc_123     11  1/abc_123/dep2/11/123.log  2019-12-04T20:27:03.988Z
3    abc_123     12  1/abc_123/dep2/12/123.log  2019-12-04T20:35:49.039Z
4    abc_123     13  1/abc_123/dep2/13/123.log  2019-12-04T20:42:36.890Z
5    abc_123     14  1/abc_123/dep2/14/123.log  2019-12-04T20:52:01.295Z
6    abc_123     15  1/abc_123/dep2/15/123.log  2019-12-04T20:58:07.132Z
7    abc_123     16  1/abc_123/dep2/16/123.log  2019-12-04T20:59:51.877Z
8    abc_123     17  1/abc_123/dep2/17/123.log  2019-12-04T21:00:23.458Z
9    abc_123     18  1/abc_123/dep2/18/123.log  2019-12-04T21:05:48.047Z
10   abc_123     19  1/abc_123/dep2/19/123.log  2019-12-05T03:10:39.325Z
11   abc_123      2   1/abc_123/dep2/2/123.log  2019-12-04T15:37:41.540Z
12   abc_123     20  1/abc_123/dep2/20/123.log  2019-12-05T04:09:39.221Z
13   abc_123     21  1/abc_123/dep2/21/123.log  2019-12-05T04:14:54.228Z
14   abc_123      3   1/abc_123/dep2/3/123.log  2019-12-04T15:41:38.340Z
15   abc_123      4   1/abc_123/dep2/4/123.log  2019-12-04T15:43:34.277Z
16   abc_123      5   1/abc_123/dep2/5/123.log  2019-12-04T15:56:18.647Z
17   abc_123      6   1/abc_123/dep2/6/123.log  2019-12-04T16:14:23.323Z
18   abc_123      7   1/abc_123/dep2/7/123.log  2019-12-04T16:19:22.126Z
19   abc_123      8   1/abc_123/dep2/8/123.log  2019-12-04T16:32:30.121Z
20   abc_123      9   1/abc_123/dep2/9/123.log  2019-12-04T16:53:54.236Z
21   abc_123      1   1/abc_123/dep_1/1/123.log  2019-11-30T06:20:16.528Z
22   abc_123     10  1/abc_123/dep_1/10/123.log  2019-12-03T07:10:38.320Z
23   abc_123     11  1/abc_123/dep_1/11/123.log  2019-12-03T09:19:33.350Z
24   abc_123     12  1/abc_123/dep_1/12/123.log  2019-12-03T09:51:49.835Z
25   abc_123     13  1/abc_123/dep_1/13/123.log  2019-12-03T10:43:19.727Z
26   abc_123     14  1/abc_123/dep_1/14/123.log  2019-12-04T06:11:52.125Z
27   abc_123     15  1/abc_123/dep_1/15/123.log  2019-12-04T06:33:58.416Z
28   abc_123     16  1/abc_123/dep_1/16/123.log  2019-12-04T06:48:18.057Z
29   abc_123      2   1/abc_123/dep_1/2/123.log  2019-11-30T16:45:13.983Z
30   abc_123      3   1/abc_123/dep_1/3/123.log  2019-11-30T18:19:14.364Z
31   abc_123      4   1/abc_123/dep_1/4/123.log  2019-12-02T08:38:01.766Z
32   abc_123      5   1/abc_123/dep_1/5/123.log  2019-12-02T10:12:45.500Z
33   abc_123      6   1/abc_123/dep_1/6/123.log  2019-12-02T12:04:03.326Z
34   abc_123      7   1/abc_123/dep_1/7/123.log  2019-12-02T15:13:11.312Z
35   abc_123      8   1/abc_123/dep_1/8/123.log  2019-12-03T05:44:47.436Z
36   abc_123      9   1/abc_123/dep_1/9/123.log  2019-12-03T06:16:05.041Z

When I am running below code当我在代码下面运行时

DF_FINAL = DF.groupby(['SERVER']).agg({'JOB_ID':'max'})

getting below output低于输出

          SERVER   JOB_ID     LOG_FILE                 TIME
20   abc_123      9   1/abc_123/dep2/9/123.log  2019-12-04T16:53:54.236Z

expected output预期产出

13   abc_123     21  1/abc_123/dep2/21/123.log  2019-12-05T04:14:54.228Z

I refered this link .我参考了这个链接 But its not giving me correct answer.但它没有给我正确的答案。

Column JOB_ID is not numeric, but strings (dtype is object ), so need convert it to numeric before your solution:JOB_ID不是数字,而是字符串(dtype 是object ),因此需要在解决方案之前将其转换为数字:

DF.JOB_ID = DF.JOB_ID.astype(int)

If not working solution above, because some non numeric values use:如果上面的解决方案不起作用,因为一些非数值使用:

DF.JOB_ID = pd.to_numeric(DF.JOB_ID, errors='coerce')

Last use DataFrameGroupBy.idxmax for index labels with DataFrame.loc :最后使用DataFrameGroupBy.idxmax用于索引标识DataFrame.loc

DF_FINAL = DF.loc[DF.groupby('SERVER')['JOB_ID'].idxmax()]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas groupby -- 根据另一列的最大值得到 output 值 - Pandas groupby -- get output value based on max value of another column Pandas Dataframe:groupby id查找最大列值并返回另一列的对应值 - Pandas Dataframe: groupby id to find max column value and return corresponding value of another column Pandas groupby 并根据另一列的标准创建最大值或总和 - Pandas groupby and create max or sum based on critera of another column 如何根据 python 中另一列的条件查找两个日期之间特定列的最大值 - How do I Find max value of a particular column between 2 dates based on a condition from another column in python 如何根据 pandas.groupby().max() 中一列的最大值获取整行? - How to get the whole row based on a max value from one column in pandas.groupby().max()? Pandas groupby 标识另一列中具有最大值的元素 - Pandas groupby with identification of an element with max value in another column Pandas Dataframe - GroupBy 键并将最大值保留在另一列 - Pandas Dataframe - GroupBy key and keep max value on a another column Pandas dataframe,在一行中,查找所选列中的最大值,并根据该值查找另一列的值 - Pandas dataframe, in a row, to find the max in selected column, and find value of another column based on that 根据每组 python pandas groupby 中的另一列计算列的值 - calculate a column's value based on another column in each group of python pandas groupby Python Pandas 根据另一列的值取值,求列中小于当前值的最大值 - Python Pandas Getting Values Based on Value of Another Column, Finding Max Value in Column Less Than Current Value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM