简体   繁体   English

熊猫:添加(sum)具有某些不同索引和列的数据帧

[英]Pandas: add (sum) dataframes with some different indices and columns

I'm trying to use Pandas's capabilities to add other dataframes together as well, but the ways I'm trying to do it are not really working out. 我正在尝试使用Pandas的功能将其他数据框也添加在一起,但是我尝试的方法并没有真正解决。 Generally, the two dataframes will have a few rows that are the same (whose values should be added), and a few rows that are different (and should be concatenated). 通常,两个数据框将具有相同的几行(应添加其值),以及具有不同的几行(并应将其串联)。 However, the index may be different as well. 但是,索引也可能不同。 As below: 如下:

# dataframe 1
pi = pd.PeriodIndex(start=2017, periods=10, freq='M')
a = pd.Series(np.full(shape=10, fill_value=2), pi)
b = pd.Series(np.full(shape=10, fill_value=3), pi)
df1= pd.DataFrame({'data_1': a, 'data_2': b})

# dataframe 2 to be added with longer index & additional data column
pi2 = pd.PeriodIndex(start=2016, periods=30, freq='M')
a = pd.Series(np.full(shape=30, fill_value=2), pi2)
b = pd.Series(np.full(shape=30, fill_value=3), pi2)
c = pd.Series(np.full(shape=30, fill_value=3), pi2)
df2= pd.DataFrame({'data_1': a, 'data_2': b, 'data_3': c})

new_df = df1 + df2
# returns a sum for all indices where there is a union, then nan 
# for everything else --> need to preserve values at those other locations
# data_3 should return array/series full of 3s instead of nans
# new_df.iloc[0,0] should return 2 instead of nan

I've tried a few things, but not really getting it to work as any not_null or fill_na stuff gets called after the nans are generated. 我已经尝试了一些方法,但是并没有真正使它能够正常工作,因为在生成Nans之后会调用任何not_null或fill_na东西。

new_idx = df1.index.union(df2.index)
new_cols = df2.columns.union(df2.columns)
new_df = df1.loc[new_idx, new_cols].fillna(0) + df2.loc[new_idx, new_cols].fillna(0)

Edit: Actually you can just use 编辑:实际上你可以只使用

new_df = df1.add(df2, fill_value=0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM