Pandas Groupby 並比較行以找到最大值

Question

我有一個 dataframe

一個	b	c
一	6	11
一	7	12
二	8	23
二	9	14
三	10	15
三	20	25

我想groupby at column a然后highest value in column c ，以便標記最大值，即

一個	b	c
一	6	11
一	7	12

比較值 11&12，然后

一個	b	c
二	8	23
二	9	14

比較值 23&14，然后

一個	b	c
三	10	15
三	20	25

最終導致：

一個	b	c	旗幟
一	6	11	不
一	7	12	是的
二	8	23	是的
二	9	14	不
三	10	15	不
三	20	25	是的

輸入/輸出 DF :

df = pd.DataFrame({
    'a':["one","one","two","two","three","three"]
    , 'b':[6,7,8,9,10,20]
    , 'c':[11,12,23,14,15,25]
    # , 'flag': ['no', 'yes', 'yes', 'no', 'no', 'yes']
})
df

Answer 1

您可以使用groupby.transform獲取每組的最大值，並將numpy.where用於 map 將True / False設置為'yes' / 'no' ：

df['flag'] = np.where(df.groupby('a')['c'].transform('max').eq(df['c']), 'yes', 'no')

output：

       a   b   c flag
0    one   6  11   no
1    one   7  12  yes
2    two   8  23  yes
3    two   9  14   no
4  three  10  15   no
5  three  20  25  yes

中間體：

df.groupby('a')['c'].transform('max')

0    12
1    12
2    23
3    23
4    25
5    25
Name: c, dtype: int64

df.groupby('a')['c'].transform('max').eq(df['c'])
0    False
1     True
2     True
3    False
4    False
5     True
Name: c, dtype: bool

Answer 2

使用GroupBy.transform和max ，比較同一列c然后在numpy.where中設置yes/no ：

df['flag'] = np.where(df.c.eq(df.groupby('a')['c'].transform('max')), 'yes', 'no')

print(df)
       a   b   c flag
0    one   6  11   no
1    one   7  12  yes
2    two   8  23  yes
3    two   9  14   no
4  three  10  15   no
5  three  20  25  yes

如果每個a最大值的多個值得到多個yes ，如果只需要第一個最大值，請使用DataFrameGroupBy.idxmax並比較df.index ：

df = pd.DataFrame({
    'a':["one","one","one","two","three","three"]
    , 'b':[6,7,8,9,10,20]
    , 'c':[11,12,12,14,15,25]
})

df['flag1'] = np.where(df.c.eq(df.groupby('a')['c'].transform('max')), 'yes', 'no')
df['flag2'] = np.where(df.index == df.groupby('a')['c'].transform('idxmax'), 'yes', 'no')

print(df)

       a   b   c flag1 flag2
0    one   6  11    no    no
1    one   7  12   yes   yes
2    one   8  12   yes    no <- difference for match all max or first max
3    two   9  14   yes   yes
4  three  10  15    no    no
5  three  20  25   yes   yes

Answer 3

一種方法如下

df['flag'] = df.apply(lambda x: 'yes' if x['c'] in df.groupby('a')['c'].max().values and x['a'] == df.groupby('c')['a'].max().loc[x['c']] else 'no', axis=1)

       a   b   c flag
0    one   6  11   no
1    one   7  12  yes
2    two   8  23  yes
3    two   9  14   no
4  three  10  15   no
5  three  20  25  yes

分解上面正在執行的各個步驟

df['flag']創建名為flag的新列。
df.groupby('a')['c'].max()將按列a分組，與pandas.DataFrame.groupby列，並在c列中找到最大值
```
 df2 = df.groupby('a')['c'].max()
```

然后我們檢查該值是否在步驟 2 中生成的dataframe中以及組是否相同。

 df['flag'] = df.apply(lambda x: 'yes' if x['c'] in df2.values and x['a'] == df2.loc[x['c']] else 'no', axis=1)

筆記：

檢查組是否相同是關鍵，否則，即使它適用於這種特定情況，如果一個組的非最大值是另一個組的最大值（如 mozway 所述），它將不起作用。
正如jezrael 共享的答案所示， .apply可能會很慢，即使可以工作，它也可能不是最方便的方法。

Pandas Groupby 並比較行以找到最大值

問題描述

3 個解決方案

解決方案1
3 已采納 2022-09-20 08:08:00

解決方案2
2 2022-09-20 08:08:13

解決方案3
2 2022-09-20 08:13:54

Pandas Groupby 並比較行以找到最大值

問題描述

3 個解決方案

解決方案1 3 已采納 2022-09-20 08:08:00

解決方案2 2 2022-09-20 08:08:13

解決方案3 2 2022-09-20 08:13:54

解決方案1
3 已采納 2022-09-20 08:08:00

解決方案2
2 2022-09-20 08:08:13

解決方案3
2 2022-09-20 08:13:54