简体   繁体   中英

merging two dataframes with same rows and indexes in pandas

I'm trying to merge two pandas dataframes that have common row indexes and common columns 0,1,2 but different column 3, so the resulting dataframe has columns from both:

First dataframe:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 817 entries, 0 to 816
Data columns (total 3 columns):
0    817 non-null int64
1    817 non-null int64
2    817 non-null float64
dtypes: float64(1), int64(2)
memory usage: 19.2 KB


0   1       2
0   1950    1   -0.060310
1   1950    2   0.626810
2   1950    3   -0.008128
3   1950    4   0.555100
4   1950    5   0.071577

Second dataframe:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 817 entries, 0 to 816
Data columns (total 3 columns):
0    817 non-null int64
1    817 non-null int64
2    817 non-null float64
dtypes: float64(1), int64(2)
memory usage: 19.2 KB

0   1       2
0   1950    1   0.92
1   1950    2   0.40
2   1950    3   -0.36
3   1950    4   0.73
4   1950    5   -0.59

So far I tried with merge:

pd.merge(df, df2, left_index=True, right_index=True, how='outer')

But results are not what I expect:

    0_x     1_x     2_x     0_y     1_y     2_y
0   1950    1   -0.060310   1950    1   0.92
1   1950    2   0.626810    1950    2   0.40
2   1950    3   -0.008128   1950    3   -0.36
3   1950    4   0.555100    1950    4   0.73
4   1950    5   0.071577    1950    5   -0.59

And with concat:

pd.concat([df, df2], axis=1, ignore_index=True).head()


0   1       2       3       4       5
0   1950    1   -0.060310   1950    1   0.92
1   1950    2   0.626810    1950    2   0.40
2   1950    3   -0.008128   1950    3   -0.36
3   1950    4   0.555100    1950    4   0.73
4   1950    5   0.071577    1950    5   -0.59

I'm expecting something like

0   1       2       3     
0   1950    1   -0.060310    0.92
1   1950    2   0.626810     0.40
2   1950    3   -0.008128    -0.36
3   1950    4   0.555100     0.73
4   1950    5   0.071577     -0.59

EDIT : Maybe I was unclear and I apologize if so, I'm trying to add the last column from the second dataset in the resulting one, so I have the same year, month, value1 and then value2 columns

I would try:

pd.merge(df, df2, on=['0', '1'])

maybe

pd.merge(df, df2, on=[0,1]

Just do:

df.merge(df2, on=1)

you don't need to add index column, once they have same index. And it can be a inner join by default.

Your error was made the merge just by index, the merge function doesn't know that the column 1 is equal in both data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM