簡體   English   中英

在 Pandas 中刪除重復項時,如果一列的值不是 None 則保留行

[英]Keep rows if one column's values are not None while dropping duplicates in Pandas

給定一個玩具數據框,如下所示:

id       type      name     purpose
1       retail    tower a    sell
        retail    tower a    rent
        office      t1       sell  
2       office      t1       rent
        retail      t2       sell
        retail      t2       rent
        retail      s1       sell
5       office      s1       rent

我想根據子集列typename刪除重復項,而不是保留firstlastdf.drop_duplicates(subset = ['type', 'name'], keep= 'last') ),我希望保留如果id列不是None則為 row 。

預期的結果將是這樣的:

id       type      name     purpose
1       retail    tower a    sell
2       office      t1       rent
        retail      t2       rent
        retail      s1       sell
5       office      s1       rent

我怎么能在 Python 中做到這一點? 謝謝。

您可以通過測試非缺失值來創建輔助列,通過 iloc 更改行的iloc並通過最大值獲取索引,這意味着 DataFrameGroupBy.idxmax 最后一次非DataFrameGroupBy.idxmax ,最后一次傳遞給loc

idx = df.assign(tmp = df['id'].notna()).iloc[::-1].groupby(['type','name'])['tmp'].idxmax()
df = df.loc[idx.iloc[::-1]]
print (df)
    id    type     name purpose
0  1.0  retail  tower a    sell
3  2.0  office       t1    rent
5  NaN  retail       t2    rent
6  NaN  retail       s1    sell
7  5.0  office       s1    rent

如果想保留第一個值:

idx = df.assign(tmp = df['id'].notna()).groupby(['type','name'], sort=False)['tmp'].idxmax()
df = df.loc[idx]
print (df)
    id    type     name purpose
0  1.0  retail  tower a    sell
3  2.0  office       t1    rent
4  NaN  retail       t2    sell
6  NaN  retail       s1    sell
7  5.0  office       s1    rent

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM