[英]How to remove duplicates from list of lists which is in pandas data frame
我有以下數據框。 我想比較具有列表列表的兩列並刪除重復項,然后將兩者合並為一個。 我正在嘗試以下邏輯,但它會拋出錯誤“TypeError:unhashable type:'list'”。
數據框:-
df = pd.DataFrame({'col1':[[[1452, 5099], [1418, 499]], [[1427, 55099]]],
'col2':[[[1452, 5099], [1417, 490]], [[1317, 55010]]]})
df
col1 col2
0 [[1452, 5099], [1418, 499]] [[1452, 5099], [1417, 490]]
1 [[1427, 55099]] [[1317, 55010]]
res = [list(set(l1).union(l2) - set(l1).intersection(l2)) for l1, l2 in zip(df['col1'].tolist(), df['col2'].tolist())]
print(res)
錯誤:
類型錯誤:無法散列的類型:“列表”
例外 output:-
res = [[[1452, 5099], [1418, 499],[1417, 490]], [[1427, 55099],[1317, 55010]]]
df['result']=res
print(df)
col1 col2 result
0 [[1452, 5099], [1418, 499]] [[1452, 5099], [1417, 490]] [[1452, 5099], [1418, 499],[1417, 490]
1 [[1427, 55099]] [[1317, 55010]] [[1427, 55099],[1317, 55010]
您需要暫時將列表轉換為元組才能進行哈希處理。
最干凈的可能是使用 helper function:
def merge(list_of_lists):
seen = set()
out = []
for l in list_of_lists:
for item in l:
t = tuple(item)
if t not in seen:
out.append(item)
seen.add(t)
return out
df['result'] = [merge(l) for l in zip(df['col1'], df['col2'])]
一種更 hacky 且可讀性更差的方法是使用中間字典作為容器:
df['result'] = [list({tuple(x): x for l in lst for x in l}.values())
for lst in zip(df['col1'], df['col2'])]
output:
col1 col2 result
0 [[1452, 5099], [1418, 499]] [[1452, 5099], [1417, 490]] [[1452, 5099], [1418, 499], [1417, 490]]
1 [[1427, 55099]] [[1317, 55010]] [[1427, 55099], [1317, 55010]]
添加列(連接列表),然后將 map 元素添加到 2 元組並使用 set 刪除重復項:
df['res'] = df.col1 + df.col2
df.res = [list(set(map(tuple,x)) )for x in df.res]
#df.res:
#0 [(1452, 5099), (1417, 490), (1418, 499)]
#1 [(1317, 55010), (1427, 55099)]
#Name: res, dtype: object
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.