简体   繁体   English

熊猫-合并不同大小的数据框

[英]Pandas - Merging Different Sized DataFrames

I am having an issue merging two frames with a different amount of rows. 我在合并具有不同行数的两个框架时遇到问题。 The first dataframe has 5K rows, and the second dataframe has 20K rows. 第一个数据帧具有5K行,第二个数据帧具有20K行。 There is a column "id" in both frames, and all 5K "id" values will occur in the frame with 20K rows. 两个框架中都有一个列“ id”,所有5K“ id”值将出现在具有20K行的框架中。

first frame "df" 第一帧“ df”

     A    B    id    A_1    B_1
0    1    1    1     0.5    0.5
1    3    2    2     0.2    0.4
2    3    4    3     0.8    0.9

second frame "df_2" 第二帧“ df_2”

     A    B    id    
0    1    1    1    
1    3    2    2    
2    3    4    3    
3    1    2    4    
4    3    1    5     

Hopeful output frame "df_out" 有希望的输出框架“ df_out”

     A    B    id    A_1    B_1
0    1    1    1     0.5    0.5
1    3    2    2     0.2    0.4
2    3    4    3     0.8    0.9
3    1    2    4     na     na
4    3    1    5     na     na

My attempts to merge on 'id' have left me with only the 5k rows. 我尝试对'id'进行合并,只剩下5k行。 The operation I am seeking is to preserve all the rows of the large dataframe, and stick Nan values for the data that does not exist in the large frame. 我要寻找的操作是保留大数据框的所有行,并为大框架中不存在的数据保留Nan值。

Thanks 谢谢

Just specify how=outer to df.merge so that you use the union of both DataFrames. 只需指定对df.merge how=outer ,以便使用两个DataFrame的并集即可。

>>> df.merge(df_2, how='outer')
     A  A_1    B  B_1   id
0  1.0  0.5  1.0  0.5  1.0
1  3.0  0.2  2.0  0.4  2.0
2  3.0  0.8  4.0  0.9  3.0
3  1.0  NaN  2.0  NaN  4.0
4  3.0  NaN  1.0  NaN  5.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM