在 Pandas 中刪除重復項時，如果一列的值不是 None 則保留行

Question

給定一個玩具數據框，如下所示：

id       type      name     purpose
1       retail    tower a    sell
        retail    tower a    rent
        office      t1       sell  
2       office      t1       rent
        retail      t2       sell
        retail      t2       rent
        retail      s1       sell
5       office      s1       rent

我想根據子集列type和name刪除重復項，而不是保留first或last （ df.drop_duplicates(subset = ['type', 'name'], keep= 'last') ），我希望保留如果id列不是None則為 row 。

預期的結果將是這樣的：

id       type      name     purpose
1       retail    tower a    sell
2       office      t1       rent
        retail      t2       rent
        retail      s1       sell
5       office      s1       rent

我怎么能在 Python 中做到這一點？ 謝謝。

Answer 1

您可以通過測試非缺失值來創建輔助列，通過 iloc 更改行的iloc並通過最大值獲取索引，這意味着 DataFrameGroupBy.idxmax 最后一次非DataFrameGroupBy.idxmax ，最后一次傳遞給loc ：

idx = df.assign(tmp = df['id'].notna()).iloc[::-1].groupby(['type','name'])['tmp'].idxmax()
df = df.loc[idx.iloc[::-1]]
print (df)
    id    type     name purpose
0  1.0  retail  tower a    sell
3  2.0  office       t1    rent
5  NaN  retail       t2    rent
6  NaN  retail       s1    sell
7  5.0  office       s1    rent

如果想保留第一個值：

idx = df.assign(tmp = df['id'].notna()).groupby(['type','name'], sort=False)['tmp'].idxmax()
df = df.loc[idx]
print (df)
    id    type     name purpose
0  1.0  retail  tower a    sell
3  2.0  office       t1    rent
4  NaN  retail       t2    sell
6  NaN  retail       s1    sell
7  5.0  office       s1    rent

在 Pandas 中刪除重復項時，如果一列的值不是 None 則保留行

問題描述

1 個解決方案

解決方案1
1 已采納 2020-10-13 06:25:57

在 Pandas 中刪除重復項時，如果一列的值不是 None 則保留行

問題描述

1 個解決方案

解決方案1 1 已采納 2020-10-13 06:25:57

解決方案1
1 已采納 2020-10-13 06:25:57