[英]Compare and equalize two data frames in Python
I have the following dataframes: df1 我有以下数据框:df1
C1 C2
F56 345
G45 65
H13 56
H67 578
Y78 64
df2 df2
C1 C2
A34 10
F56 345
H13 56
Y78 64
I want to compare the above two dataframes and if df1 contains a value in C1 which is not present in df2 or vise-versa I want to add a new row with the missing value with corresponding C2 value = 0. So the resulting dataframes will look like the following. 我想比较以上两个数据帧,如果df1包含C1中的值,而df2中不存在此值,反之亦然,我想添加一个新行,其中缺少的值对应的C2值=0。因此,结果数据帧将看起来如下所示。
df1 df1
C1 C2
A34 0
F56 345
G45 65
H13 56
H67 578
Y78 64
df2 df2
C1 C2
A34 10
F56 345
G45 0
H13 56
H67 0
Y78 64
Appreciate any recommendations. 感谢任何建议。
This is a great use case for DataFrame.merge: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.merge.html 这是DataFrame.merge的绝佳用例: https ://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.merge.html
What is awesome about merge that the manner of the join will be familiar if you've spent any time in a relational database (left, right, inner, outer). 合并的妙处在于,如果您花了任何时间在关系数据库(左,右,内部,外部)中,联接的方式就会很熟悉。
The indicator
parameter is of particular interest to you in this case: 在这种情况下, indicator
参数对您特别重要:
result_df1 = df1.merge(
df2,
how = "outer",
on = "C1",
indicator = True,
suffixes = ("", "_df2")
)
So the results with np.nan
in column C2
in this specific join you'll want to fill with 0
and then drop the extra columns we've introduced. 因此,在此特定np.nan
的C2
列中使用np.nan
的结果,您将要填充0
,然后删除我们介绍的多余列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.