[英]Add two data frames with different size by date
I have two data frames that I need to add together.我有两个数据框需要加在一起。
The two data frames could look something like this:这两个数据框可能看起来像这样:
df1 =
date col1 col2
01-01-20 1 2
02-01-20 2 4
03-01-20 3 6
04-01-20 4 8
05-01-20 5 10
df2 =
date col1 col2
03-01-20 1 2
04-01-20 2 4
05-01-20 3 6
Now, what I am currently doing is just:现在,我目前正在做的只是:
df_sum = df1.set_index("date") + df2.set_index("date")
which returns:返回:
df_sum =
01-01-20 NaN NaN
02-01-20 NaN NaN
03-01-20 4 8
04-01-20 6 12
05-01-20 8 16
But what I would like instead is:但我想要的是:
df_sum_correct =
01-01-20 1 2
02-01-20 2 4
03-01-20 4 8
04-01-20 6 12
05-01-20 8 16
So that instead of transform the rows which doesn't have equal dates just keeps the values from the data frame which actually has values for this date, instead of transforming all values in rows with non-equal dates into NaN
.因此,不是转换日期不相等的行,而是保留数据框中实际具有该日期值的值,而不是将日期不相等的行中的所有值转换为
NaN
。
How can this be done?如何才能做到这一点?
Use DataFrame.add
with fill_value
parameter:使用
DataFrame.add
和fill_value
参数:
df_sum = df1.set_index("date").add(df2.set_index("date"), fill_value=0)
Or concat
with aggregate sum
:或
sum
concat
:
df_sum = pd.concat([df1, df2]).groupby("date").sum()
In addition to @jezrael answer, to get the DataFrame back instead of summed values, you can use the following除了@jezrael 的回答,要返回 DataFrame 而不是求和值,您可以使用以下命令
df_correct = pd.concat([df1, df2]).groupby("date").apply(lambda df: df[:])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.