根據列的最大值過濾 Pandas dataframe 中的行

Question

我有一個 pandas dataframe 以下列

col1   col2   col3
x      12     abc
x       7     abc
x       5     abc
x       3   
y      10     abc
y       9     abc

我想找到 pandas DataFrame 中具有 col2 列最大值的所有行，在過濾 col3 為 null 的行后按“col1”列分組后？

預期的 output 是：

col1   col2   col3
x      12     abc
y      10     abc

到目前為止，我已經嘗試了以下代碼。

df[df[['col3']].notnull().all(1) & df.sort_values('col2').drop_duplicates(['col1'], keep='last')]

但是我收到以下錯誤。

TypeError: unsupported operand type(s) for &: 'bool' and 'float'

非常感謝任何幫助

Answer 1

max方法如何在不提及列的情況下進行計算？

根據pd.DataFrame.max ，它返回所選軸上的最大值，默認值為（0，索引）。

在您的示例中，您只有 1 個數值，並且 col3 中的所有值都相同。 如果 col3 也是數字， max方法將返回該列的最大值，結果 DataFrame 可能與原始行有不同的行。

它適用於這種情況，但如果您只希望 output DataFrame 的行與原始行相同，則需要具體說明要考慮其最大值的列。

df.loc[df.notnull().all(axis=1)].groupby('col1').max().reset_index()

  col1  col2 col3
0    x    12  abc
1    y    10  abc

或者你可以先創建一個 boolean Series 並為其分配一個名稱以提高可讀性：

m = df.notnull().all(axis=1)
df.loc[m].groupby('col1').max().reset_index()

現在假設這是您原來的 DataFrame：

  col1  col2  col3
0    x    12   2.0
1    x     7  20.0
2    x     5   1.0
3    x     3   NaN
4    y    10   4.0
5    y     9  11.0

當您在不指定列名的情況下應用max時，它將返回以下內容：

  col1  col2  col3
0    x    12  20.0
1    y    10  11.0

Answer 2

另一種解決方案

import pandas as pd
lstColumns=["col1","col2","col3"]
lstValues=[["x",12,"abc"],["x",7,"abc"],["x",5,"abc"],["x",3,"abc"],["y",10,"abc"],["y",9,"abc"]]
df=pd.DataFrame(lstValues,columns=lstColumns)
df=df.sort_values(['col1', 'col2'], ascending=[True, True])
newdf=df.drop_duplicates(subset='col1', keep="last")

  col1  col2 col3
0    x    12  abc
4    y    10  abc

根據列的最大值過濾 Pandas dataframe 中的行

問題描述

2 個解決方案

解決方案1
2 已采納 2023-01-17 22:14:44

解決方案2
1 2023-01-17 22:46:33

根據列的最大值過濾 Pandas dataframe 中的行

問題描述

2 個解決方案

解決方案1 2 已采納 2023-01-17 22:14:44

解決方案2 1 2023-01-17 22:46:33

解決方案1
2 已采納 2023-01-17 22:14:44

解決方案2
1 2023-01-17 22:46:33