[英]How can I calculate the sum of two `pandas.DataFrame` based on `pandas.DataFrame.index`?
import pandas as pd
data = [[1, 2], [3, 4]]
index1 = ['I1', 'I2']
index2 = ['I1', 'I3']
columns = ['C1', 'C2']
df1 = pd.DataFrame(data, index=index1, columns=columns)
df2 = pd.DataFrame(data, index=index2, columns=columns)
print(df1)
# C1 C2
#I1 1 2
#I2 3 4
print(df2)
# C1 C2
#I1 1 2
#I3 3 4
print(...) # Calculate somehow
## !!!!!Expected Result!!!!!
# C1 C2
#I1 2 4
#I2 3 4
#I3 3 4
The expected result is a dataframe whose values are like below.预期结果是 dataframe,其值如下所示。
df1
and df2
have a row named 'I1'
. I1:两个数据帧的总和,因为df1
和df2
都有一个名为'I1'
的行。df1.loc['I2']
because df2
doesn't have this index. I2:使用df1.loc['I2']
的值,因为df2
没有这个索引。df2.loc['I3']
because df1
doesn't have this index. I3:使用df2.loc['I3']
的值,因为df1
没有这个索引。print(df1.add(df2, axis='index'))
# C1 C2
#I1 2.0 4.0
#I2 NaN NaN
#I3 NaN NaN
print(pd.concat([df1, df2]))
# C1 C2
#I1 1 2
#I2 3 4
#I1 1 2
#I3 3 4
print(df1 + df2.values)
# C1 C2
#I1 2 4
#I2 6 8
Could you help me get the expect result?你能帮我得到预期的结果吗?
Try using DataFrame.add()
尝试使用DataFrame.add()
df = df1.add(df2, fill_value=0)
dataframe matches your output but may need to fix dtypes you can use dataframe 与您的 output 匹配,但可能需要修复您可以使用的 dtypes
df["C1"] = df["C1"].astype(np.int64)
df["C2"] = df["C2"].astype(np.int64)
for not using with numpy, use just int
instead of np.int64
in the code对于不与 numpy 一起使用,在代码中只使用int
而不是np.int64
for documentation on this see Pandas Documentation有关此文档,请参阅Pandas 文档
Try chain with groupby
尝试使用groupby
进行链式连接
out = pd.concat([df1, df2]).groupby(level=0).sum()
Out[161]:
C1 C2
I1 2 4
I2 3 4
I3 3 4
What you are looking for is the df.combine method This method combines both of your dataframes together with a given, function just like the docs show您正在寻找的是 df.combine 方法此方法将您的两个数据帧与给定的 function 组合在一起,就像文档显示
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine.html
So basically what you need to do is the following,所以基本上你需要做的是以下,
func = lambda s1,s2: s1+s2
df3 = df1.combine(df2,func,fill_value=0)
print(df3)
This gives you a little more flexibility than add这为您提供了比添加更多的灵活性
Here is one way to do it using combine_first, successively这是使用 combine_first 的一种方法,依次
df3=df3.combine_first(df1).combine_first(df2)
df3
C1 C2
I1 2.0 4.0
I2 3.0 4.0
I3 3.0 4.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.