[英]Keep rows if one column's values are not None while dropping duplicates in Pandas
給定一個玩具數據框,如下所示:
id type name purpose
1 retail tower a sell
retail tower a rent
office t1 sell
2 office t1 rent
retail t2 sell
retail t2 rent
retail s1 sell
5 office s1 rent
我想根據子集列type
和name
刪除重復項,而不是保留first
或last
( df.drop_duplicates(subset = ['type', 'name'], keep= 'last')
),我希望保留如果id
列不是None
則為 row 。
預期的結果將是這樣的:
id type name purpose
1 retail tower a sell
2 office t1 rent
retail t2 rent
retail s1 sell
5 office s1 rent
我怎么能在 Python 中做到這一點? 謝謝。
您可以通過測試非缺失值來創建輔助列,通過 iloc 更改行的iloc
並通過最大值獲取索引,這意味着 DataFrameGroupBy.idxmax 最后一次非DataFrameGroupBy.idxmax
,最后一次傳遞給loc
:
idx = df.assign(tmp = df['id'].notna()).iloc[::-1].groupby(['type','name'])['tmp'].idxmax()
df = df.loc[idx.iloc[::-1]]
print (df)
id type name purpose
0 1.0 retail tower a sell
3 2.0 office t1 rent
5 NaN retail t2 rent
6 NaN retail s1 sell
7 5.0 office s1 rent
如果想保留第一個值:
idx = df.assign(tmp = df['id'].notna()).groupby(['type','name'], sort=False)['tmp'].idxmax()
df = df.loc[idx]
print (df)
id type name purpose
0 1.0 retail tower a sell
3 2.0 office t1 rent
4 NaN retail t2 sell
6 NaN retail s1 sell
7 5.0 office s1 rent
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.