将新数据帧添加到现有数据库，但仅在列名匹配时添加

Question

I have two dataframes that I am trying to combine but I'm not getting the result I want using pandas.concat. 我有两个数据帧，我试图结合，但我没有得到我想要使用pandas.concat的结果。

I have a database of data that I want to add new data to but only if the column of name matches. 我有一个数据库，我想要添加新数据，但仅限于名称列匹配。

Let says df1 is: 我们说df1是：

and df2 is: 和df2是：

A E D F
7 7 8 8
9 9 0 0

the result I would like to get is: 我想得到的结果是：

The blank data doesn't have to be - it can be anything. 空白数据不一定是-它可以是任何东西。

When I use: 我用的时候：

results = pandas.concat([df1, df2], axis=0, join='outer')

it gives me a new dataframe with all of the columns A through F, instead of what I want. 它为我提供了一个包含所有A到F列的新数据框，而不是我想要的。 Any ideas for how I can accomplish this? 关于如何实现这一目标的任何想法？ Thanks! 谢谢！

Answer 1

You want to use the pd.DataFrame.align method and specify that you want to align with the left argument's indices and that you only care about columns. 您想要使用pd.DataFrame.align方法并指定您想要与left参数的索引对齐，并且您只关心列。

d1, d2 = df1.align(df2, join='left', axis=1)

Then you can use pd.DataFrame.append or pd.concat 然后您可以使用pd.DataFrame.append或pd.concat

pd.concat([d1, d2], ignore_index=True)

   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
3  7  NaN  NaN  8
4  9  NaN  NaN  0

Or 要么

d1.append(d2, ignore_index=True)

   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
3  7  NaN  NaN  8
4  9  NaN  NaN  0

My preferred way would be to skip the reassignment to names 我首选的方法是跳过重新分配名称

pd.concat(df1.align(df2, 'left', 1), ignore_index=True)

   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
3  7  NaN  NaN  8
4  9  NaN  NaN  0

Answer 2

You can use find the intersection of columns on df2 and concat or append : 您可以使用在df2和concat或append上查找列的交集：

pd.concat(
    [df1, df2[df1.columns.intersection(df2.columns)]]
)

Or, 要么，

df1.append(df2[df1.columns.intersection(df2.columns)])

   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
0  7  NaN  NaN  8
1  9  NaN  NaN  0

Answer 3

You can also use reindex and concat: 你也可以使用reindex和concat：

pd.concat([df1,df2.reindex(columns=df1.columns)])
Out[81]: 
   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
0  7  NaN  NaN  8
1  9  NaN  NaN  0

Answer 4

Transpose first before merging. 在合并之前先进行转置。

df1.T.merge(df2.T, how="left", left_index=True, right_index=True).T

    A   B   C   D
0_x 1.0 1.0 2.0 2.0
1_x 3.0 3.0 4.0 4.0
2   5.0 5.0 6.0 6.0
0_y 7.0 NaN NaN 8.0
1_y 9.0 NaN NaN 0.0

df1.T           df2.T

    0   1   2      1 2
A   1   3   5   A  7 9
B   1   3   5   E  7 9
C   2   4   6   D  8 0
D   2   4   6   F  8 0

Now the result can be obtained with a merge with how="left" and we use the indices as the join key by passing left_index=True and right_index=True . 现在可以通过合并how="left"获得结果，并通过传递left_index=True和right_index=True将索引用作连接键。

df1.T.merge(df2.T, how="left", left_index=True, right_index=True)

    0_x 1_x 2   0_y 1_y
A   1   3   5   7.0 9.0
B   1   3   5   NaN NaN
C   2   4   6   NaN NaN
D   2   4   6   8.0 0.0

将新数据帧添加到现有数据库，但仅在列名匹配时添加

问题描述

4 个解决方案

解决方案1
6 已采纳 2018-02-20 01:00:59

解决方案2
4 2018-02-20 01:03:33

解决方案3
3 2018-02-20 02:33:45

解决方案4
0 2018-02-20 02:45:19

将新数据帧添加到现有数据库，但仅在列名匹配时添加

问题描述

4 个解决方案

解决方案1 6 已采纳 2018-02-20 01:00:59

解决方案2 4 2018-02-20 01:03:33

解决方案3 3 2018-02-20 02:33:45

解决方案4 0 2018-02-20 02:45:19

解决方案1
6 已采纳 2018-02-20 01:00:59

解决方案2
4 2018-02-20 01:03:33

解决方案3
3 2018-02-20 02:33:45

解决方案4
0 2018-02-20 02:45:19