[英]Pandas: Merging and comparing dataframes
I've got 3 Dataframes I would like to merge or join by "label" and then being able to compare all columns我有 3 个数据框我想通过“标签”合并或加入,然后能够比较所有列
Examples of df are below: df 的示例如下:
df1 df1
Label,col1,col2,col3
NF1,1,1,6
NF2,3,2,8
NF3,4,5,4
NF4,5,7,2
NF5,6,2,2
df2 df2
Label,col1,col1,col3
NF1,8,4,5
NF2,4,7,8
NF3,9,7,8
df3 df3
Label,col1,col1,col3
NF1,2,8,8
NF2,6,2,0
NF3,2,2,5
NF4,2,4,9
NF5,2,5,8
and what ill like to see is similar to不喜欢看到的类似于
Label,df1_col1,df2_col1,df_col1,df1_col2,df2_col2,df3_col2,df1_col3,df2_col3,df_col3
NF1,1,8,2,1,4,8,6,5,8
NF2,3,4,6,2,7,2,8,8,0
NF3,4,9,2,5,7,2,4,8,5
NF4,5,,2,7,,4,2,,9
NF5,6,,2,2,,5,2,,8
but I'm to suggestions on how to make the comparisons more readable.但我会就如何使比较更具可读性提出建议。
Thanks!谢谢!
Use concat
with list of DataFrames, add parameter keys
for prefixes and sorting by columns names:将
concat
与 DataFrame 列表一起使用,为前缀添加参数keys
并按列名排序:
dfs = [df1, df2, df3]
k = ('df1','df2','df3')
df = (pd.concat([x.set_index('Label') for x in dfs], axis=1, keys=k)
.sort_index(axis=1, level=1)
.rename_axis('Label')
.reset_index())
df.columns = df.columns.map('_'.join).str.strip('_')
print (df)
Label df1_col1 df2_col1 df3_col1 df2_col1.1 df3_col1.1 df1_col2 \
0 NF1 1 8.0 2 4.0 8 1
1 NF2 3 4.0 6 7.0 2 2
2 NF3 4 9.0 2 7.0 2 5
3 NF4 5 NaN 2 NaN 4 7
4 NF5 6 NaN 2 NaN 5 2
df1_col3 df2_col3 df3_col3
0 6 5.0 8
1 8 8.0 0
2 4 8.0 5
3 2 NaN 9
4 2 NaN 8
You can use df.merge
:您可以使用
df.merge
:
In [1965]: res = df1.merge(df2, on='Label', how='left', suffixes=('_df1', '_df2')).merge(df3, on='Label', how='left').rename(columns={'col1': 'col1_df3','col2':'col2_df3','col3':'col3_df3'})
In [1975]: res = res.reindex(sorted(res.columns), axis=1)
In [1976]: res
Out[1965]:
Label col1_df1 col1_df2 col1_df3 col2_df1 col2_df2 col2_df3 col3_df1 col3_df2 col3_df3
0 NF1 1 8.00 2 1 4.00 8 6 5.00 8
1 NF2 3 4.00 6 2 7.00 2 8 8.00 0
2 NF3 4 9.00 2 5 7.00 2 4 8.00 5
3 NF4 5 nan 2 7 nan 4 2 nan 9
4 NF5 6 nan 2 2 nan 5 2 nan 8
We can use Pandas' join method, by setting the Label
column as the index and joining the dataframes:我们可以使用 Pandas 的连接方法,将
Label
列设置为索引并连接数据框:
dfs = [df1,df2,df3]
keys = ['df1','df2','df3']
#set Label as index
df1, *others = [frame.set_index("Label").add_prefix(f"{prefix}_")
for frame,prefix in zip(dfs,keys)]
#join df1 with others
outcome = df1.join(others,how='outer').rename_axis(index='Label').reset_index()
outcome
Label df1_col1 df1_col2 df1_col3 df2_col1 df2_col2 df2_col3 df3_col1 df3_col2 df3_col3
0 NF1 1 1 6 8.0 4.0 5.0 2 8 8
1 NF2 3 2 8 4.0 7.0 8.0 6 2 0
2 NF3 4 5 4 9.0 7.0 8.0 2 2 5
3 NF4 5 7 2 NaN NaN NaN 2 4 9
4 NF5 6 2 2 NaN NaN NaN 2 5 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.