[英]Removing duplicates from a Pandas dataframe based on the conditions of another column
[英]Filter dataframe by removing duplicates from column containing list pandas
Dataframe 列包含列表中的字符串值。 Dataframe 需要轉換為在“Final”列中具有唯一列表的行
我有 dataframe 如下,
string1 string2 Final
1 [abc,ncx] [qwe, rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
3 [ncx,abc] [rty,qwe] [mango,apple]
4 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
6 [ncx,abc] [rty,qwe] [mango,apple]
df['final'] 列必須刪除重復列表並轉換 dataframe 以在 'final' 列中包含唯一列表。
所需的 output dataframe:
string1 string2 Final
1 [abc,ncx] [qwe, rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana,grapes, apple]
3 [ncx,abc] [rty,qwe] [mango,apple]
4 [uio,dfg] [zxc,dfv] [banana, apple]
通過Series.duplicated
創建的~
反轉掩碼,但由於list
s 不可散列,首先將它們轉換為元組並在boolean indexing
中過濾:
df = df[~df['Final'].apply(tuple).duplicated()]
print (df)
string1 string2 Final
1 [abc,ncx] [qwe,rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana, grapes, apple]
3 [ncx,abc] [rty,qwe] [mango, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
如果apple, mango
應該與mango, apple
重復(順序不重要)將tuple
更改為frozenset
:
df = df[~df['Final'].apply(frozenset).duplicated()]
print (df)
string1 string2 Final
1 [abc,ncx] [qwe,rty] [apple, mango]
2 [uio,pas,dfg] [zxc,vbg,dfv] [banana, grapes, apple]
5 [uio,dfg] [zxc,dfv] [banana, apple]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.