简体   繁体   中英

Pandas merge two df

I have two DataFrames

df1 has following form

    ID    col1    col2
0   1     2       10
1   3     1       21

and df2 looks like this

    ID    field1    field2
0   1     4         1
1   1     3         3
2   3     5         4
3   3     9         5
4   1     2         0

I want to concatenate both DataFrames but so that I have only one line per each ID, so it'd look like this:

    ID   col1    col2   field1_1    field2_1    field1_2    field2_2    field1_3    field2_3
0   1    2       10     4           1           3           3           2           0
1   3    1       21     5           4           9           5

I have tried merging and pivoting the data df.pivot(index=df1.index, columns='ID') But because the length is variable, I become a ValueError.

ValueError: all arrays must be same length

Without over formatting, we want to merge and add a level of a multi index that counts the 'ID' s.

df = df1.merge(df2)
cc = df.groupby('ID').cumcount()
df.set_index(['ID', 'col1', 'col2', cc]).unstack()

             field1           field2          
                  0    1    2      0    1    2
ID col1 col2                                  
1  2    10      4.0  3.0  2.0    1.0  3.0  0.0
3  1    21      5.0  9.0  NaN    4.0  5.0  NaN

We can nail down the formatting with:

df = df1.merge(df2)
cc = df.groupby('ID').cumcount() + 1
d1 = df.set_index(['ID', 'col1', 'col2', cc]).unstack().sort_index(axis=1, level=1)
d1.columns = d1.columns.to_series().map('{0[0]}_{0[1]}'.format)
d1.reset_index()

   ID  col1  col2  field1_1  field2_1  field1_2  field2_2  field1_3  field2_3
0   1     2    10       4.0       1.0       3.0       3.0       2.0       0.0
1   3     1    21       5.0       4.0       9.0       5.0       NaN       NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM