简体   繁体   English

Pandas:合并和比较数据帧

[英]Pandas: Merging and comparing dataframes

I've got 3 Dataframes I would like to merge or join by "label" and then being able to compare all columns我有 3 个数据框我想通过“标签”合并或加入,然后能够比较所有列

Examples of df are below: df 的示例如下:

df1 df1

Label,col1,col2,col3
NF1,1,1,6
NF2,3,2,8
NF3,4,5,4
NF4,5,7,2
NF5,6,2,2

df2 df2

Label,col1,col1,col3
NF1,8,4,5
NF2,4,7,8
NF3,9,7,8

df3 df3

Label,col1,col1,col3
NF1,2,8,8
NF2,6,2,0
NF3,2,2,5
NF4,2,4,9
NF5,2,5,8

and what ill like to see is similar to不喜欢看到的类似于

Label,df1_col1,df2_col1,df_col1,df1_col2,df2_col2,df3_col2,df1_col3,df2_col3,df_col3
NF1,1,8,2,1,4,8,6,5,8
NF2,3,4,6,2,7,2,8,8,0
NF3,4,9,2,5,7,2,4,8,5
NF4,5,,2,7,,4,2,,9
NF5,6,,2,2,,5,2,,8

but I'm to suggestions on how to make the comparisons more readable.但我会就如何使比较更具可读性提出建议。

Thanks!谢谢!

Use concat with list of DataFrames, add parameter keys for prefixes and sorting by columns names:concat与 DataFrame 列表一起使用,为前缀添加参数keys并按列名排序:

dfs = [df1, df2, df3]
k = ('df1','df2','df3')
df = (pd.concat([x.set_index('Label') for x in dfs], axis=1, keys=k)
        .sort_index(axis=1, level=1)
        .rename_axis('Label')
        .reset_index())
df.columns = df.columns.map('_'.join).str.strip('_')
print (df)
  Label  df1_col1  df2_col1  df3_col1  df2_col1.1  df3_col1.1  df1_col2  \
0   NF1         1       8.0         2         4.0           8         1   
1   NF2         3       4.0         6         7.0           2         2   
2   NF3         4       9.0         2         7.0           2         5   
3   NF4         5       NaN         2         NaN           4         7   
4   NF5         6       NaN         2         NaN           5         2   

   df1_col3  df2_col3  df3_col3  
0         6       5.0         8  
1         8       8.0         0  
2         4       8.0         5  
3         2       NaN         9  
4         2       NaN         8  

You can use df.merge :您可以使用df.merge

In [1965]: res = df1.merge(df2, on='Label', how='left', suffixes=('_df1', '_df2')).merge(df3, on='Label', how='left').rename(columns={'col1': 'col1_df3','col2':'col2_df3','col3':'col3_df3'})

In [1975]: res = res.reindex(sorted(res.columns), axis=1)

In [1976]: res

Out[1965]: 
  Label  col1_df1  col1_df2  col1_df3  col2_df1  col2_df2  col2_df3  col3_df1  col3_df2  col3_df3
0   NF1         1      8.00         2         1      4.00         8         6      5.00         8
1   NF2         3      4.00         6         2      7.00         2         8      8.00         0
2   NF3         4      9.00         2         5      7.00         2         4      8.00         5
3   NF4         5       nan         2         7       nan         4         2       nan         9
4   NF5         6       nan         2         2       nan         5         2       nan         8

We can use Pandas' join method, by setting the Label column as the index and joining the dataframes:我们可以使用 Pandas 的连接方法,将Label列设置为索引并连接数据框:

dfs = [df1,df2,df3]
keys = ['df1','df2','df3']

#set Label as index
df1, *others = [frame.set_index("Label").add_prefix(f"{prefix}_")
                for frame,prefix in zip(dfs,keys)]

#join df1 with others
outcome = df1.join(others,how='outer').rename_axis(index='Label').reset_index()

outcome


    Label   df1_col1    df1_col2    df1_col3    df2_col1    df2_col2    df2_col3    df3_col1    df3_col2    df3_col3
0   NF1     1           1            6          8.0         4.0          5.0     2  8   8
1   NF2     3           2            8          4.0         7.0          8.0    6   2   0
2   NF3     4           5            4          9.0         7.0          8.0    2   2   5
3   NF4     5           7            2          NaN         NaN          NaN    2   4   9
4   NF5     6           2            2          NaN         NaN          NaN    2   5   8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM