[英]How to zip add pandas Dataframes with different columns
I would like to add two dfs in a zip-like manner: 我想以拉链方式添加两个dfs:
df1: DF1:
X
a b
1 1 2
1 2 3
df2: DF2:
X
c
1 1
2 2
desired result: 期望的结果:
df1+df2=
X
a b c
1 1 1 3
1 1 2 4
1 2 1 4
1 2 2 5
The only idea I have going row by row but that's hideous.. 唯一的想法是我一行一行,但这很可怕..
The problem, as it is now, can be solved with broadcasting: 现在的问题可以通过广播来解决:
# new values
new_vals = df1.X.values[:,None] + df2.X.values[None,:]
# new dataframe:
new_df = pd.DataFrame(new_vals, index=df1.index, columns=df2.index)
# stack for the multi-index:
new_df.stack()
output: 输出:
a b c
1 1 1 3
2 4
2 1 4
2 5
dtype: int64
It still works if you have more than one columns, but needs little tweaking on the new_df's columns: 如果你有多个列,它仍然有效,但在new_df的列上几乎不需要调整:
df1 = (pd.DataFrame({'a':[1,1],
'b':[1,2],
'X':[0,3],
'Y':[1,2]})
.set_index(['a','b'])
)
df2 = (pd.DataFrame({'c':[1,2,3],
'X':[1,2,3],
'Y':[0,1,5]})
.set_index('c')
)
new_vals = df1.values[:,None] + df2.values[None,:]
new_df = pd.DataFrame(data=new_vals.reshape(len(df1), df2.shape[1]*df2.shape[0]),
index=df1.index,
columns=pd.MultiIndex.from_product((df2.index, df2.columns) )
)
Output: 输出:
X Y
a b
1 1 1 1 1
2 2 2
3 3 6
2 1 4 2
2 5 3
3 6 7
It is easy using concat
使用
concat
很容易
pd.concat([df1+df2.loc[x] for x in df2.index],1,keys=df2.index).stack(0)
Out[267]:
X
a b c
1 1 1 3
2 4
2 1 4
2 5
Another solution, creating a new MultiIndex.from_tuples
from a list comprehension, then using DataFrame.reindex
and DataFrame.add
: 另一个解决方案,从列表
MultiIndex.from_tuples
创建一个新的MultiIndex.from_tuples
,然后使用DataFrame.reindex
和DataFrame.add
:
new_idx = pd.MultiIndex.from_tuples([x + (y,) for x in df1.index.to_flat_index()
for y in df2.index], names=['a', 'b', 'c'])
df1.reindex(new_idx).add(df2)
[out] [OUT]
X
a b c
1 1 1 3
2 4
2 1 4
2 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.