如何從 pandas 數據框中的列表列表中刪除重復項

Question

我有以下數據框。 我想比較具有列表列表的兩列並刪除重復項，然后將兩者合並為一個。 我正在嘗試以下邏輯，但它會拋出錯誤“TypeError：unhashable type：'list'”。

數據框：-

df  = pd.DataFrame({'col1':[[[1452, 5099], [1418, 499]], [[1427, 55099]]],
                     'col2':[[[1452, 5099], [1417, 490]], [[1317, 55010]]]})
df
         col1                                    col2
0   [[1452, 5099], [1418, 499]]       [[1452, 5099], [1417, 490]]
1   [[1427, 55099]]                   [[1317, 55010]]

res =  [list(set(l1).union(l2) - set(l1).intersection(l2)) for l1, l2 in zip(df['col1'].tolist(), df['col2'].tolist())]
print(res)

錯誤：

類型錯誤：無法散列的類型：“列表”

例外 output:-

res = [[[1452, 5099], [1418, 499],[1417, 490]], [[1427, 55099],[1317, 55010]]]
df['result']=res
print(df)
            col1                                  col2                   result
    0   [[1452, 5099], [1418, 499]]   [[1452, 5099], [1417, 490]]    [[1452, 5099], [1418, 499],[1417, 490]
    1   [[1427, 55099]]               [[1317, 55010]]                [[1427, 55099],[1317, 55010]

Answer 1

您需要暫時將列表轉換為元組才能進行哈希處理。

最干凈的可能是使用 helper function：

def merge(list_of_lists):
    seen = set()
    out = []
    for l in list_of_lists:
        for item in l:
            t = tuple(item)
            if t not in seen:
                out.append(item)
                seen.add(t)
    return out

df['result'] = [merge(l) for l in zip(df['col1'], df['col2'])]

一種更 hacky 且可讀性更差的方法是使用中間字典作為容器：

df['result'] = [list({tuple(x): x for l in lst for x in l}.values())
                for lst in zip(df['col1'], df['col2'])]

output：

                          col1                         col2                                    result
0  [[1452, 5099], [1418, 499]]  [[1452, 5099], [1417, 490]]  [[1452, 5099], [1418, 499], [1417, 490]]
1              [[1427, 55099]]              [[1317, 55010]]            [[1427, 55099], [1317, 55010]]

Answer 2

添加列（連接列表），然后將 map 元素添加到 2 元組並使用 set 刪除重復項：

df['res'] = df.col1 + df.col2

df.res = [list(set(map(tuple,x)) )for x in df.res]
#df.res:
#0    [(1452, 5099), (1417, 490), (1418, 499)]
#1              [(1317, 55010), (1427, 55099)]
#Name: res, dtype: object

如何從 pandas 數據框中的列表列表中刪除重復項

問題描述

2 個解決方案

解決方案1
2 已采納 2022-09-28 15:09:34

解決方案2
1 2022-09-28 15:14:31

如何從 pandas 數據框中的列表列表中刪除重復項

問題描述

2 個解決方案

解決方案1 2 已采納 2022-09-28 15:09:34

解決方案2 1 2022-09-28 15:14:31

解決方案1
2 已采納 2022-09-28 15:09:34

解決方案2
1 2022-09-28 15:14:31