I'm trying to figure out how can I merge two (or more) pandas dataframes like this:
df1:
| ant | nac |
| uyn | yam | qlv | udb | rkd |
---|-----|-----|-----|-----|-----|
X1 | 6 | 1 | 8 | 4 | 5 |
X2 | 4 | 5 | 3 | 5 | 4 |
X3 | 2 | 9 | 2 | 9 | 4 |
df2:
| baz | ant |
| rjv | ifz | uyn | pgc | yam |
---|-----|-----|-----|-----|-----|
X1 | 2 | 1 | 7 | 3 | 8 |
X2 | 9 | 7 | 3 | 1 | 4 |
X3 | 2 | 1 | 6 | 2 | 9 |
into a dataframe like this:
| ant | nac | baz |
| uyn | yam | pgc | qlv | udb | rkd | rjv | ifz |
| df1 | df2 | df1 | df2 | df2 | df1 | df1 | df1 | df2 | df2 |
X1 | 6 | 7 | 1 | 8 | 3 | 8 | 4 | 5 | 2 | 1 |
X2 | 4 | 3 | 5 | 3 | 1 | 3 | 5 | 4 | 9 | 7 |
X3 | 2 | 6 | 9 | 2 | 2 | 2 | 9 | 4 | 2 | 1 |
I've tried to use concat
, but then stuck sorting items in each level in the right order (as items are not alphabetically sorted). Tried reindex
and ended with a lot of empty columns because of intersections of ant on *udb` and such.
Maybe I'm over complicating things. Is there some easier way to do this?
Try with concat
with keys
then reorder_levels
:
new_df = (
pd.concat((df1, df2), axis=1, keys=('df1', 'df2'))
.reorder_levels([1, 2, 0], axis=1)
)
Optional programmatically assign keys for the "or more" case:
dfs = (df1, df2)
new_df = (
pd.concat(dfs, axis=1, keys=map('df{}'.format, range(1, len(dfs) + 1)))
.reorder_levels([1, 2, 0], axis=1)
)
ant nac baz ant
uyn yam qlv udb rkd rjv ifz uyn pgc yam
df1 df1 df1 df1 df1 df2 df2 df2 df2 df2
X1 6 1 8 4 5 2 1 7 3 8
X2 4 5 3 5 4 9 7 3 1 4
X3 2 9 2 9 4 2 1 6 2 9
Then try successive reindex
to "sort" by first occurrence across:
new_df = (
new_df.reindex(
columns=new_df.columns.get_level_values(0).drop_duplicates(),
level=0
).reindex(
columns=new_df.columns.get_level_values(1).drop_duplicates(),
level=1
)
)
ant nac baz
uyn yam pgc qlv udb rkd rjv ifz
df1 df2 df1 df2 df2 df1 df1 df1 df2 df2
X1 6 7 1 8 3 8 4 5 2 1
X2 4 3 5 4 1 3 5 4 9 7
X3 2 6 9 9 2 2 9 4 2 1
df1
and df2
used:
df1 = pd.DataFrame({('ant', 'uyn'): {'X1': 6, 'X2': 4, 'X3': 2},
('ant', 'yam'): {'X1': 1, 'X2': 5, 'X3': 9},
('nac', 'qlv'): {'X1': 8, 'X2': 3, 'X3': 2},
('nac', 'udb'): {'X1': 4, 'X2': 5, 'X3': 9},
('nac', 'rkd'): {'X1': 5, 'X2': 4, 'X3': 4}})
df2 = pd.DataFrame({('baz', 'rjv'): {'X1': 2, 'X2': 9, 'X3': 2},
('baz', 'ifz'): {'X1': 1, 'X2': 7, 'X3': 1},
('ant', 'uyn'): {'X1': 7, 'X2': 3, 'X3': 6},
('ant', 'pgc'): {'X1': 3, 'X2': 1, 'X3': 2},
('ant', 'yam'): {'X1': 8, 'X2': 4, 'X3': 9}})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.