[英]Find difference between two data frames which has values in form of list
I have two dataframes which has column values in the form of list.我有两个数据框,它们具有列表形式的列值。
I want to find the difference between two such dataframes.我想找到两个这样的数据帧之间的区别。
I can do this is the values are single.我可以做到这一点是价值观是单一的。
Using pd.concat()使用 pd.concat()
difference_dbs2 = pd.concat([dic_asset2,dic_aut2]).drop_duplicates(keep=False)
But if values are in form of list i am getting error as:但是如果值是列表的形式,我会收到以下错误:
>>> TypeError: unhashable type: 'list'
Example dfs:示例 dfs:
df1=pd.DataFrame({"A": ["S"], "B": [[1,2,3]]})
df2=pd.DataFrame({"A": ["S"], "B": [[1,3,5]]})
The expected output should be预期的输出应该是
>>> A B
0 S [2]
That is values which are there in df1
but not in df2
这是df1
中存在但df2
不存在的df2
Real data is like :真实数据是这样的:
HOSTNAME DATABASE
EU2XXXXXXX [ASSAS, ASAS, FSSD, DSD...]
This will do the trick: Few assumptions: (1) A
represents partition ie we want to get diff of B
per unique A
这将解决问题:很少有假设:(1) A
表示分区,即我们希望每个唯一的A
获得B
差异
(2) You want to get diff as per sets theory ie [1,2,3]-[1,3,5]=[2]
(2) 你想根据集合理论得到差异,即[1,2,3]-[1,3,5]=[2]
import pandas as pd
df1=pd.DataFrame({"A": ["S"], "B": [[1,2,3]]})
df2=pd.DataFrame({"A": ["S"], "B": [[1,3,5]]})
df1=df1.set_index("A")["B"].apply(pd.Series).stack().reset_index(level=1, drop=True).reset_index()
df2=df2.set_index("A")["B"].apply(pd.Series).stack().reset_index(level=1, drop=True).reset_index()
df3=pd.merge(df1,df2,on=list(df1.columns),how="outer",indicator=True)
df3=df3.loc[df3['_merge']=='left_only'].drop("_merge", axis=1).groupby("A", as_index=False).agg(list).rename(columns={0: "B"})
Output:输出:
A B
0 S [2]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.