简体   繁体   English

按日期添加两个不同大小的数据框

[英]Add two data frames with different size by date

I have two data frames that I need to add together.我有两个数据框需要加在一起。

The two data frames could look something like this:这两个数据框可能看起来像这样:

df1 =

    date       col1    col2
    01-01-20   1       2
    02-01-20   2       4
    03-01-20   3       6
    04-01-20   4       8
    05-01-20   5       10

df2 =

    date       col1    col2
    03-01-20   1       2
    04-01-20   2       4
    05-01-20   3       6

Now, what I am currently doing is just:现在,我目前正在做的只是:

df_sum = df1.set_index("date") + df2.set_index("date")

which returns:返回:

df_sum =

    01-01-20   NaN     NaN
    02-01-20   NaN     NaN
    03-01-20   4       8
    04-01-20   6       12
    05-01-20   8       16

But what I would like instead is:但我想要的是:

df_sum_correct =

    01-01-20   1       2
    02-01-20   2       4
    03-01-20   4       8
    04-01-20   6       12
    05-01-20   8       16

So that instead of transform the rows which doesn't have equal dates just keeps the values from the data frame which actually has values for this date, instead of transforming all values in rows with non-equal dates into NaN .因此,不是转换日期不相等的行,而是保留数据框中实际具有该日期值的值,而不是将日期不相等的行中的所有值转换为NaN

How can this be done?如何才能做到这一点?

Use DataFrame.add with fill_value parameter:使用DataFrame.addfill_value参数:

df_sum = df1.set_index("date").add(df2.set_index("date"), fill_value=0)

Or concat with aggregate sum :sum concat

df_sum = pd.concat([df1, df2]).groupby("date").sum()

In addition to @jezrael answer, to get the DataFrame back instead of summed values, you can use the following除了@jezrael 的回答,要返回 DataFrame 而不是求和值,您可以使用以下命令

df_correct = pd.concat([df1, df2]).groupby("date").apply(lambda df: df[:])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM