熊貓groupby在多列中獲得最大的行

Question

尋找一組在多列中具有最大值的行：

pd.DataFrame([{'grouper': 'a', 'col1': 1, 'col2': 3, 'uniq_id': 1}, {'grouper': 'a', 'col1': 2, 'col2': 4, 'uniq_id': 2}, {'grouper': 'a', 'col1': 3, 'col2': 2, 'uniq_id': 3}])

   col1  col2 grouper  uniq_id
0     1     3       a        1
1     2     4       a        2
2     3     2       a        3

在上面，我按“分組”列進行分組。 在“ a”組中，我想獲取具有col1和col2最大值的行，在這種情況下，當我對DataFrame進行分組時，我想獲取uniq_id為2的行，因為它具有col1的最大值/ col2與4，因此結果將是：

   col1  col2 grouper  uniq_id
1     2     4       a        2

在我的實際示例中，我使用的是時間戳記，因此我實際上並不期望聯系。 但是，如果是平局，我對組中選擇哪一行都無所謂，因此在這種情況下，它只是組中的first 。

Answer 1

您可以嘗試的另一種方法：

# find row wise max value
df['row_max'] = df[['col1','col2']].max(axis=1)

# filter rows from groups
df.loc[df.groupby('grouper')['row_max'].idxmax()]

   col1 col2 grouper uniq_id row_max
1    2    4     a        2     4

之后，您可以使用df.drop('row_max', axis=1)刪除row_max

Answer 2

IIUC使用transform然后與原始數據幀進行比較

g=df.groupby('grouper')
s1=g.col1.transform('max')
s2=g.col2.transform('max')
s=pd.concat([s1,s2],axis=1).max(1)

df.loc[df[['col1','col2']].eq(s,0).any(1)]
Out[89]: 
   col1  col2 grouper  uniq_id
1     2     4       a        2

Answer 3

到處都是有趣的方法。 添加另一個只是為了展示apply （我非常喜歡），並使用其他一些提到的方法。

import pandas as pd

df = pd.DataFrame(
    [
        {"grouper": "a", "col1": 1, "col2": 3, "uniq_id": 1},
        {"grouper": "a", "col1": 2, "col2": 4, "uniq_id": 2},
        {"grouper": "a", "col1": 3, "col2": 2, "uniq_id": 3},
    ]
)

def find_max(grp):
    # find max value per row, then find index of row with max val
    max_row_idx = grp[["col1", "col2"]].max(axis=1).idxmax()
    return grp.loc[max_row_idx]

df.groupby("grouper").apply(find_max)

Answer 4

value  = pd.concat([df['col1'], df['col2']], axis = 0).max()
df.loc[(df['col1'] == value) | (df['col2'] == value), :]

  col1  col2 grouper uniq_id
1   2    4     a       2

這可能不是最快的方法，但可以解決您的問題。 合並兩列並找到最大值，然后在df中搜索任一列等於該值的位置。

Answer 5

您可以如下使用numpy和pandas：

import numpy as np
import pandas as pd

df = pd.DataFrame({'col1': [1, 2, 3],
          'col2': [3, 4, 2],
          'grouper': ['a', 'a', 'a'],
          'uniq_id': [1, 2, 3]})

df['temp'] = np.max([df.col1.values, df.col2.values],axis=0)
idx = df.groupby('grouper')['temp'].idxmax()
df.loc[idx].drop('temp',1)
   col1  col2 grouper  uniq_id
1     2     4       a        2

熊貓groupby在多列中獲得最大的行

問題描述

5 個解決方案

解決方案1
3 已采納 2019-05-18 23:58:59

解決方案2
2 2019-05-18 23:50:47

解決方案3
2 2019-05-19 00:19:48

解決方案4
0 2019-05-18 23:37:50

解決方案5
0 2019-05-19 03:40:07

熊貓groupby在多列中獲得最大的行

問題描述

5 個解決方案

解決方案1 3 已采納 2019-05-18 23:58:59

解決方案2 2 2019-05-18 23:50:47

解決方案3 2 2019-05-19 00:19:48

解決方案4 0 2019-05-18 23:37:50

解決方案5 0 2019-05-19 03:40:07

解決方案1
3 已采納 2019-05-18 23:58:59

解決方案2
2 2019-05-18 23:50:47

解決方案3
2 2019-05-19 00:19:48

解決方案4
0 2019-05-18 23:37:50

解決方案5
0 2019-05-19 03:40:07