查找具有列表形式值的两个数据框之间的差异

Question

I have two dataframes which has column values in the form of list.我有两个数据框，它们具有列表形式的列值。

I want to find the difference between two such dataframes.我想找到两个这样的数据帧之间的区别。

I can do this is the values are single.我可以做到这一点是价值观是单一的。

Using pd.concat()使用 pd.concat()

difference_dbs2 = pd.concat([dic_asset2,dic_aut2]).drop_duplicates(keep=False)

But if values are in form of list i am getting error as:但是如果值是列表的形式，我会收到以下错误：

>>> TypeError: unhashable type: 'list'

Example dfs:示例 dfs：

df1=pd.DataFrame({"A": ["S"], "B": [[1,2,3]]})
df2=pd.DataFrame({"A": ["S"], "B": [[1,3,5]]})

The expected output should be预期的输出应该是

>>>   A  B    
    0 S [2]

That is values which are there in df1 but not in df2这是df1中存在但df2不存在的df2

Real data is like :真实数据是这样的：

HOSTNAME    DATABASE                                                     
EU2XXXXXXX  [ASSAS, ASAS, FSSD, DSD...]

Answer 1

This will do the trick: Few assumptions: (1) A represents partition ie we want to get diff of B per unique A这将解决问题：很少有假设：（1） A表示分区，即我们希望每个唯一的A获得B差异

(2) You want to get diff as per sets theory ie [1,2,3]-[1,3,5]=[2] (2) 你想根据集合理论得到差异，即[1,2,3]-[1,3,5]=[2]

import pandas as pd

df1=pd.DataFrame({"A": ["S"], "B": [[1,2,3]]})

df2=pd.DataFrame({"A": ["S"], "B": [[1,3,5]]})

df1=df1.set_index("A")["B"].apply(pd.Series).stack().reset_index(level=1, drop=True).reset_index()

df2=df2.set_index("A")["B"].apply(pd.Series).stack().reset_index(level=1, drop=True).reset_index()

df3=pd.merge(df1,df2,on=list(df1.columns),how="outer",indicator=True)

df3=df3.loc[df3['_merge']=='left_only'].drop("_merge", axis=1).groupby("A", as_index=False).agg(list).rename(columns={0: "B"})

Output:输出：

   A    B
0  S  [2]

查找具有列表形式值的两个数据框之间的差异

问题描述

1 个解决方案

解决方案1
0 2019-12-26 14:43:02

查找具有列表形式值的两个数据框之间的差异

问题描述

1 个解决方案

解决方案1 0 2019-12-26 14:43:02

解决方案1
0 2019-12-26 14:43:02