简体   繁体   中英

Compare two similar columns in different data frames and take unmerged rows for both data frame

I have two data frames(df1 and df2) and these two have more than 100 columns. JA column is the id column. I want to compare two columns at one time and get unmerged results from both data frames just like df3. I created df3 for the BC column. I want to do that for the whole data frame. I mean I have to check every column one by one not all columns at once. When I checked a column I want to create something like df3. Is there any way to do that.

df1
       JA      AB     BC   fas   waa   ad
1      1       ace    52    5     2    ce
2      22      a e    3     5    78    ce
3      36      nas    4     4     5    as
4      45      kas    25    2    19    as
5      51                  25          sd
6      61      nas    45    8    32    as
7      897       a    34   13    34    qr
8      88      nas         12    0     as
9      29      jaa    1    10    45    aw
10     18      saa    34    0    98    aa

df2
       JA      AB     BC   fas   waa   ad
1      1       ace    52    5     2    ce
2      22      ace     3    5    18    ce
3      36      nas     1    4     5    as
4      45      kas    25   12    19    ms
5      51              5    5          sd
6      61      nas    45    8    32    as
7      897     paa    34   23          qr
8      88      nas    11   12     0    as
9      29       aa         10     5    aw
10     18      saa    34    0    98     a
 

df3
JA     BCdf1    BCdf2
36       4         1
51                 5
88                11
29       1

Processing flow:.

  1. pandas.eq() is used to create a mask by comparing two data frames.
  2. create a comparison between df1 and df2 and df2 and df1.
  3. reorder the columns by joining each one.
maskA = df1.eq(df2)

JA  AB  BC  fas waa ad
1   True    True    True    True    True    True
2   True    False   True    True    False   True
3   True    True    False   True    True    True
4   True    True    True    False   True    False
5   True    False   False   False   False   True
6   True    True    True    True    True    True
7   True    False   True    False   False   True
8   True    True    False   True    True    True
9   True    False   False   True    False   True
10  True    True    True    True    True    False

df_1 = df1[~maskA].fillna('')
maskB = df2.eq(df1)
df_2 = df2[~maskB].fillna('')
df_1['JA'].update(df1['JA'])
df_2['JA'].update(df2['JA'])

df_all = pd.merge(df_1, df_2, on='JA', suffixes=('_df1','_df2')).sort_index(axis=1)
cols = ['JA','AB_df1','AB_df2','BC_df1','BC_df2','ad_df1','ad_df2','fas_df1','fas_df2','waa_df1','waa_df2']
df_all.loc[:,cols]

df_all
    JA  AB_df1  AB_df2  BC_df1  BC_df2  ad_df1  ad_df2  fas_df1 fas_df2 waa_df1 waa_df2
0   1                                       
1   22  a e ace                         78  18
2   36          4   1                       
3   45                  as  ms  2   12      
4   51              5           25  5       
5   61                                      
6   897 a   paa                 13  23  34  
7   88              11                      
8   29  jaa aa  1                       45  5
9   18                  aa  a               

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM