I have two data frames with same column labels like below:
df1 = {'key_1': {0: 'F', 1: 'H', 2: 'E'},
'key_2': {0: 'F', 1: 'G', 2: 'E'},
'min': {0: -158, 1: -881, 2: -674},
'count': {0: 58, 1: 24, 2: 13}}
df2 = {'key_1': {0: 'C', 1: 'L', 2: 'F', 3: 'K'},
'key_2': {0: 'C', 1: 'D', 2: 'F', 3: 'K'},
'min': {0: -452, 1: -153, 2: -181, 3: -120},
'count': {0: 7470, 1: 1262, 2: 171, 3: 86}}
pandas.DataFrame.compare is useful for side by side comparison of each column, but it is not working for comparing data frames with different rows
df1.compare(df2, keep_shape=True, keep_equal=True)
ValueError: Can only compare identically-labeled DataFrame objects
can we achieve the same functionality using pandas.merge?
I tried below but it is NOT giving side by side comparison for each corresponding column
pd.merge(df1,df2, on=['key_1','key_2'], suffixes=['_df1','_df2'], how='outer')
key_1 key_2 min_df1 count_df1 min_df2 count_df2
0 F F -158.0 58.0 -181.0 171.0
1 H G -881.0 24.0 NaN NaN
2 E E -674.0 13.0 NaN NaN
3 C C NaN NaN -452.0 7470.0
4 L D NaN NaN -153.0 1262.0
5 K K NaN NaN -120.0 86.0
Use concat
with convert ['key_1','key_2']
to MultiIndex
:
df = (pd.concat([df1.set_index(['key_1','key_2']),
df2.set_index(['key_1','key_2'])], keys=['df1','df2'], axis=1)
.sort_index(level=1, axis=1))
print (df)
df1 df2 df1 df2
count count min min
key_1 key_2
C C NaN 7470.0 NaN -452.0
E E 13.0 NaN -674.0 NaN
F F 58.0 171.0 -158.0 -181.0
H G 24.0 NaN -881.0 NaN
K K NaN 86.0 NaN -120.0
L D NaN 1262.0 NaN -153.0
After the merge, ou can re-order the columns alphabetically in order to have them side by side:
first_columns = ['key_1','key_2']
merged_df = pd.merge(df1,df2, on=['key_1','key_2'], suffixes=['_df1','_df2'], how='outer')
merged_df = merged_df[first_columns + sorted([col for col in merged_df.columns if col not in first_columns ])]
One way:
merged_df = pd.merge(df1, df2, on=['key_1', 'key_2'], suffixes=[
'_df1', '_df2'], how='outer').set_index(['key_1', 'key_2'])
merged_df.columns = merged_df.columns.str.split('_', expand=True)
merged_df = merged_df.sort_index(level=0, axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.