import pandas as pd
data = [[1, 2], [3, 4]]
index1 = ['I1', 'I2']
index2 = ['I1', 'I3']
columns = ['C1', 'C2']
df1 = pd.DataFrame(data, index=index1, columns=columns)
df2 = pd.DataFrame(data, index=index2, columns=columns)
print(df1)
# C1 C2
#I1 1 2
#I2 3 4
print(df2)
# C1 C2
#I1 1 2
#I3 3 4
print(...) # Calculate somehow
## !!!!!Expected Result!!!!!
# C1 C2
#I1 2 4
#I2 3 4
#I3 3 4
The expected result is a dataframe whose values are like below.
df1
and df2
have a row named 'I1'
. df1.loc['I2']
because df2
doesn't have this index. df2.loc['I3']
because df1
doesn't have this index. print(df1.add(df2, axis='index'))
# C1 C2
#I1 2.0 4.0
#I2 NaN NaN
#I3 NaN NaN
print(pd.concat([df1, df2]))
# C1 C2
#I1 1 2
#I2 3 4
#I1 1 2
#I3 3 4
print(df1 + df2.values)
# C1 C2
#I1 2 4
#I2 6 8
Could you help me get the expect result?
Try using DataFrame.add()
df = df1.add(df2, fill_value=0)
dataframe matches your output but may need to fix dtypes you can use
df["C1"] = df["C1"].astype(np.int64)
df["C2"] = df["C2"].astype(np.int64)
for not using with numpy, use just int
instead of np.int64
in the code
for documentation on this see Pandas Documentation
Try chain with groupby
out = pd.concat([df1, df2]).groupby(level=0).sum()
Out[161]:
C1 C2
I1 2 4
I2 3 4
I3 3 4
What you are looking for is the df.combine method This method combines both of your dataframes together with a given, function just like the docs show
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine.html
So basically what you need to do is the following,
func = lambda s1,s2: s1+s2
df3 = df1.combine(df2,func,fill_value=0)
print(df3)
This gives you a little more flexibility than add
Here is one way to do it using combine_first, successively
df3=df3.combine_first(df1).combine_first(df2)
df3
C1 C2
I1 2.0 4.0
I2 3.0 4.0
I3 3.0 4.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.