簡體   English   中英

通過從包含列表 pandas 的列中刪除重復項來過濾 dataframe

[英]Filter dataframe by removing duplicates from column containing list pandas

Dataframe 列包含列表中的字符串值。 Dataframe 需要轉換為在“Final”列中具有唯一列表的行

我有 dataframe 如下,

    string1           string2           Final
1   [abc,ncx]       [qwe, rty]        [apple, mango]
2   [uio,pas,dfg]   [zxc,vbg,dfv]     [banana,grapes, apple]
3   [ncx,abc]       [rty,qwe]         [mango,apple]
4   [uio,pas,dfg]   [zxc,vbg,dfv]     [banana,grapes, apple]
5   [uio,dfg]        [zxc,dfv]        [banana, apple]
6   [ncx,abc]       [rty,qwe]         [mango,apple]

df['final'] 列必須刪除重復列表並轉換 dataframe 以在 'final' 列中包含唯一列表。

所需的 output dataframe:

     string1           string2           Final
1   [abc,ncx]       [qwe, rty]        [apple, mango]
2   [uio,pas,dfg]   [zxc,vbg,dfv]     [banana,grapes, apple]
3   [ncx,abc]       [rty,qwe]         [mango,apple]
4   [uio,dfg]        [zxc,dfv]        [banana, apple]

通過Series.duplicated創建的~反轉掩碼,但由於list s 不可散列,首先將它們轉換為元組並在boolean indexing中過濾:

df = df[~df['Final'].apply(tuple).duplicated()]
print (df)
         string1        string2                    Final
1      [abc,ncx]      [qwe,rty]           [apple, mango]
2  [uio,pas,dfg]  [zxc,vbg,dfv]  [banana, grapes, apple]
3      [ncx,abc]      [rty,qwe]           [mango, apple]
5      [uio,dfg]      [zxc,dfv]          [banana, apple]

如果apple, mango應該與mango, apple重復(順序不重要)將tuple更改為frozenset

df = df[~df['Final'].apply(frozenset).duplicated()]
print (df)
         string1        string2                    Final
1      [abc,ncx]      [qwe,rty]           [apple, mango]
2  [uio,pas,dfg]  [zxc,vbg,dfv]  [banana, grapes, apple]
5      [uio,dfg]      [zxc,dfv]          [banana, apple]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM