[英]Python : How to compare two data frames
I have two data frames: 我有两个数据框:
df1
A1 B1
1 a
2 s
3 d
and 和
df2
A1 B1
1 a
2 x
3 d
I want to compare df1 and df2 on column B1. 我想比较B1列上的df1和df2。 The column A1 can be used to join.
列A1可用于联接。 I want to know:
我想知道:
I tried using merge and join but that is not what I am looking for. 我尝试使用合并和联接,但这不是我想要的。
I've edited the raw data to illustrate the case of A1 keys in one dataframe but not the other. 我已经编辑了原始数据,以说明一个数据帧中A1键的情况,而不是其他数据帧。
When doing your merge, you want to specify an 'outer' merge so that you can see these items with an A1 key in one dataframe but not the other. 进行合并时,您想指定一个“外部”合并,这样您就可以在一个数据框中看到带有A1键的这些项目,而在另一个数据框中则看不到。
I've included the suffixes '_1' and '_2' to indicate the dataframe source (_1 = df1
and _2 = df2
) of column B1
. 我添加了后缀'_1'和'_2'来指示列
B1
的数据帧源(_1 = df1
和_2 = df2
)。
df1 = pd.DataFrame({'A1': [1, 2, 3, 4], 'B1': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'A1': [1, 2, 3, 5], 'B1': ['a', 'd', 'c', 'e']})
df3 = df1.merge(df2, how='outer', on='A1', suffixes=['_1', '_2'])
df3['check'] = df3.B1_1 == df3.B1_2
>>> df3
A1 B1_1 B1_2 check
0 1 a a True
1 2 b d False
2 3 c c True
3 4 d NaN False
4 5 NaN e False
To check for missing A1 keys in df1
and df2
: 要检查
df1
和df2
是否缺少A1键:
# A1 value missing in `df1`
>>> d3[df3.B1_1.isnull()]
A1 B1_1 B1_2 check
4 5 NaN e False
# A1 value missing in `df2`
>>> df3[df3.B1_2.isnull()]
A1 B1_1 B1_2 check
3 4 d NaN False
EDIT Thanks to @EdChum (the source of all Pandas knowledge...). 编辑感谢@EdChum(所有熊猫知识的来源...)。
df3 = df1.merge(df2, how='outer', on='A1', suffixes=['_1', '_2'], indicator=True)
df3['check'] = df3.B1_1 == df3.B1_2
>>> df3
A1 B1_1 B1_2 _merge check
0 1 a a both True
1 2 b d both False
2 3 c c both True
3 4 d NaN left_only False
4 5 NaN e right_only False
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.