简体   繁体   English

如何压缩添加pandas具有不同列的Dataframe

[英]How to zip add pandas Dataframes with different columns

I would like to add two dfs in a zip-like manner: 我想以拉链方式添加两个dfs:

df1: DF1:

        X
a   b
1   1   2
1   2   3

df2: DF2:

   X
c
1   1
2   2

desired result: 期望的结果:

df1+df2=
           X
a   b   c
1   1   1   3
1   1   2   4
1   2   1   4
1   2   2   5

The only idea I have going row by row but that's hideous.. 唯一的想法是我一行一行,但这很可怕..

The problem, as it is now, can be solved with broadcasting: 现在的问题可以通过广播来解决:

# new values
new_vals = df1.X.values[:,None] + df2.X.values[None,:]

# new dataframe:
new_df = pd.DataFrame(new_vals, index=df1.index, columns=df2.index)

# stack for the multi-index:
new_df.stack()

output: 输出:

a  b  c
1  1  1    3
      2    4
   2  1    4
      2    5
dtype: int64

It still works if you have more than one columns, but needs little tweaking on the new_df's columns: 如果你有多个列,它仍然有效,但在new_df的列上几乎不需要调整:

df1 = (pd.DataFrame({'a':[1,1],
                    'b':[1,2],
                    'X':[0,3],
                    'Y':[1,2]})
         .set_index(['a','b'])
      )

df2 = (pd.DataFrame({'c':[1,2,3],
                    'X':[1,2,3],
                    'Y':[0,1,5]})
         .set_index('c')
      )

new_vals = df1.values[:,None] + df2.values[None,:]

new_df = pd.DataFrame(data=new_vals.reshape(len(df1), df2.shape[1]*df2.shape[0]),
                      index=df1.index, 
                      columns=pd.MultiIndex.from_product((df2.index, df2.columns) )
                     )

Output: 输出:

       X  Y
a b        
1 1 1  1  1
    2  2  2
    3  3  6
  2 1  4  2
    2  5  3
    3  6  7

It is easy using concat 使用concat很容易

pd.concat([df1+df2.loc[x] for x in df2.index],1,keys=df2.index).stack(0)
Out[267]: 
       X
a b c   
1 1 1  3
    2  4
  2 1  4
    2  5

Another solution, creating a new MultiIndex.from_tuples from a list comprehension, then using DataFrame.reindex and DataFrame.add : 另一个解决方案,从列表MultiIndex.from_tuples创建一个新的MultiIndex.from_tuples ,然后使用DataFrame.reindexDataFrame.add

new_idx = pd.MultiIndex.from_tuples([x + (y,) for x in df1.index.to_flat_index()
                                     for y in df2.index], names=['a', 'b', 'c'])

df1.reindex(new_idx).add(df2)

[out] [OUT]

       X
a b c   
1 1 1  3
    2  4
  2 1  4
    2  5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM