简体   繁体   English

将新数据帧添加到现有数据库,但仅在列名匹配时添加

[英]Add new dataframe to existing database but only add if column name matches

I have two dataframes that I am trying to combine but I'm not getting the result I want using pandas.concat. 我有两个数据帧,我试图结合,但我没有得到我想要使用pandas.concat的结果。

I have a database of data that I want to add new data to but only if the column of name matches. 我有一个数据库,我想要添加新数据,但仅限于名称列匹配。

Let says df1 is: 我们说df1是:

A B C D
1 1 2 2
3 3 4 4
5 5 6 6

and df2 is: 和df2是:

A E D F
7 7 8 8
9 9 0 0

the result I would like to get is: 我想得到的结果是:

A B C D
1 1 2 2
3 3 4 4
5 5 6 6
7 - - 8
9 - - 0

The blank data doesn't have to be - it can be anything. 空白数据不一定是-它可以是任何东西。

When I use: 我用的时候:

results = pandas.concat([df1, df2], axis=0, join='outer')

it gives me a new dataframe with all of the columns A through F, instead of what I want. 它为我提供了一个包含所有A到F列的新数据框,而不是我想要的。 Any ideas for how I can accomplish this? 关于如何实现这一目标的任何想法? Thanks! 谢谢!

You want to use the pd.DataFrame.align method and specify that you want to align with the left argument's indices and that you only care about columns. 您想要使用pd.DataFrame.align方法并指定您想要与left参数的索引对齐,并且您只关心列。

d1, d2 = df1.align(df2, join='left', axis=1)

Then you can use pd.DataFrame.append or pd.concat 然后您可以使用pd.DataFrame.appendpd.concat

pd.concat([d1, d2], ignore_index=True)

   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
3  7  NaN  NaN  8
4  9  NaN  NaN  0

Or 要么

d1.append(d2, ignore_index=True)

   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
3  7  NaN  NaN  8
4  9  NaN  NaN  0

My preferred way would be to skip the reassignment to names 我首选的方法是跳过重新分配名称

pd.concat(df1.align(df2, 'left', 1), ignore_index=True)

   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
3  7  NaN  NaN  8
4  9  NaN  NaN  0

You can use find the intersection of columns on df2 and concat or append : 您可以使用在df2concatappend上查找列的交集:

pd.concat(
    [df1, df2[df1.columns.intersection(df2.columns)]]
)

Or, 要么,

df1.append(df2[df1.columns.intersection(df2.columns)])

   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
0  7  NaN  NaN  8
1  9  NaN  NaN  0

You can also use reindex and concat: 你也可以使用reindex和concat:

pd.concat([df1,df2.reindex(columns=df1.columns)])
Out[81]: 
   A    B    C  D
0  1  1.0  2.0  2
1  3  3.0  4.0  4
2  5  5.0  6.0  6
0  7  NaN  NaN  8
1  9  NaN  NaN  0

Transpose first before merging. 在合并之前先进行转置。

df1.T.merge(df2.T, how="left", left_index=True, right_index=True).T

    A   B   C   D
0_x 1.0 1.0 2.0 2.0
1_x 3.0 3.0 4.0 4.0
2   5.0 5.0 6.0 6.0
0_y 7.0 NaN NaN 8.0
1_y 9.0 NaN NaN 0.0

df1.T           df2.T

    0   1   2      1 2
A   1   3   5   A  7 9
B   1   3   5   E  7 9
C   2   4   6   D  8 0
D   2   4   6   F  8 0

Now the result can be obtained with a merge with how="left" and we use the indices as the join key by passing left_index=True and right_index=True . 现在可以通过合并how="left"获得结果,并通过传递left_index=Trueright_index=True将索引用作连接键。

df1.T.merge(df2.T, how="left", left_index=True, right_index=True)

    0_x 1_x 2   0_y 1_y
A   1   3   5   7.0 9.0
B   1   3   5   NaN NaN
C   2   4   6   NaN NaN
D   2   4   6   8.0 0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM