简体   繁体   English

在Python中比较和均衡两个数据帧

[英]Compare and equalize two data frames in Python

I have the following dataframes: df1 我有以下数据框:df1

C1    C2
F56   345
G45   65
H13   56
H67   578
Y78   64

df2 df2

C1    C2
A34   10
F56   345
H13   56
Y78   64

I want to compare the above two dataframes and if df1 contains a value in C1 which is not present in df2 or vise-versa I want to add a new row with the missing value with corresponding C2 value = 0. So the resulting dataframes will look like the following. 我想比较以上两个数据帧,如果df1包含C1中的值,而df2中不存在此值,反之亦然,我想添加一个新行,其中缺少的值对应的C2值=0。因此,结果数据帧将看起来如下所示。

df1 df1

C1    C2
A34   0
F56   345
G45   65
H13   56
H67   578
Y78   64

df2 df2

C1    C2
A34   10
F56   345
G45   0
H13   56
H67   0
Y78   64

Appreciate any recommendations. 感谢任何建议。

This is a great use case for DataFrame.merge: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.merge.html 这是DataFrame.merge的绝佳用例: https ://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.merge.html

What is awesome about merge that the manner of the join will be familiar if you've spent any time in a relational database (left, right, inner, outer). 合并的妙处在于,如果您花了任何时间在关系数据库(左,右,内部,外部)中,联接的方式就会很熟悉。

The indicator parameter is of particular interest to you in this case: 在这种情况下, indicator参数对您特别重要:

result_df1 = df1.merge(
    df2,
    how = "outer",
    on = "C1",
    indicator = True,
    suffixes = ("", "_df2")
)

合并结果

So the results with np.nan in column C2 in this specific join you'll want to fill with 0 and then drop the extra columns we've introduced. 因此,在此特定np.nanC2列中使用np.nan的结果,您将要填充0 ,然后删除我们介绍的多余列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM