简体   繁体   中英

Merge two pandas Dataframes adding multiindex column level and preserving initial order

I'm trying to figure out how can I merge two (or more) pandas dataframes like this:

df1:

   |    ant    |       nac       |
   | uyn | yam | qlv | udb | rkd |
---|-----|-----|-----|-----|-----|
X1 |  6  |  1  |  8  |  4  |  5  |
X2 |  4  |  5  |  3  |  5  |  4  |
X3 |  2  |  9  |  2  |  9  |  4  |

df2:

   |    baz    |       ant       |
   | rjv | ifz | uyn | pgc | yam |
---|-----|-----|-----|-----|-----|
X1 |  2  |  1  |  7  |  3  |  8  |
X2 |  9  |  7  |  3  |  1  |  4  |
X3 |  2  |  1  |  6  |  2  |  9  |

into a dataframe like this:

   |             ant             |       nac       |    baz    |
   |    uyn    |    yam    | pgc | qlv | udb | rkd | rjv | ifz |
   | df1 | df2 | df1 | df2 | df2 | df1 | df1 | df1 | df2 | df2 |
X1 |  6  |  7  |  1  |  8  |  3  |  8  |  4  |  5  |  2  |  1  |
X2 |  4  |  3  |  5  |  3  |  1  |  3  |  5  |  4  |  9  |  7  |
X3 |  2  |  6  |  9  |  2  |  2  |  2  |  9  |  4  |  2  |  1  |

I've tried to use concat , but then stuck sorting items in each level in the right order (as items are not alphabetically sorted). Tried reindex and ended with a lot of empty columns because of intersections of ant on *udb` and such.

Maybe I'm over complicating things. Is there some easier way to do this?

Try with concat with keys then reorder_levels :

new_df = (
    pd.concat((df1, df2), axis=1, keys=('df1', 'df2'))
        .reorder_levels([1, 2, 0], axis=1)
)

Optional programmatically assign keys for the "or more" case:

dfs = (df1, df2)
new_df = (
    pd.concat(dfs, axis=1, keys=map('df{}'.format, range(1, len(dfs) + 1)))
        .reorder_levels([1, 2, 0], axis=1)
)
   ant     nac         baz     ant        
   uyn yam qlv udb rkd rjv ifz uyn pgc yam
   df1 df1 df1 df1 df1 df2 df2 df2 df2 df2
X1   6   1   8   4   5   2   1   7   3   8
X2   4   5   3   5   4   9   7   3   1   4
X3   2   9   2   9   4   2   1   6   2   9

Then try successive reindex to "sort" by first occurrence across:

new_df = (
    new_df.reindex(
        columns=new_df.columns.get_level_values(0).drop_duplicates(),
        level=0
    ).reindex(
        columns=new_df.columns.get_level_values(1).drop_duplicates(),
        level=1
    )
)
   ant                 nac         baz    
   uyn     yam     pgc qlv udb rkd rjv ifz
   df1 df2 df1 df2 df2 df1 df1 df1 df2 df2
X1   6   7   1   8   3   8   4   5   2   1
X2   4   3   5   4   1   3   5   4   9   7
X3   2   6   9   9   2   2   9   4   2   1

df1 and df2 used:

df1 = pd.DataFrame({('ant', 'uyn'): {'X1': 6, 'X2': 4, 'X3': 2},
                    ('ant', 'yam'): {'X1': 1, 'X2': 5, 'X3': 9},
                    ('nac', 'qlv'): {'X1': 8, 'X2': 3, 'X3': 2},
                    ('nac', 'udb'): {'X1': 4, 'X2': 5, 'X3': 9},
                    ('nac', 'rkd'): {'X1': 5, 'X2': 4, 'X3': 4}})

df2 = pd.DataFrame({('baz', 'rjv'): {'X1': 2, 'X2': 9, 'X3': 2},
                    ('baz', 'ifz'): {'X1': 1, 'X2': 7, 'X3': 1},
                    ('ant', 'uyn'): {'X1': 7, 'X2': 3, 'X3': 6},
                    ('ant', 'pgc'): {'X1': 3, 'X2': 1, 'X3': 2},
                    ('ant', 'yam'): {'X1': 8, 'X2': 4, 'X3': 9}})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM