简体   繁体   中英

How to check one dataframe column available or not in other four dataframe's column?

I'm having one base dataframe like below-

df1_data = {'id' :{0:'101',1:'102',2:'103',3:'104',4:'105'},
        'sym1' :{0:'abc',1:'pqr',2:'xyz',3:'mno',4:'lmn'}}
df1 = pd.DataFrame(df1_data)
print df1

    id sym1
0  101  abc
1  102  pqr
2  103  xyz
3  104  mno
4  105  lmn

From this dataframe, I want to check column sym1 available or not in other four dataframes column or not?

four different dataframes:

df2_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
        'name' :{0:'a',1:'b',2:'c',3:'d'}}
df2 = pd.DataFrame(df2_data)
print df2

df3_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
            'name' :{0:'h',1:'i',2:'k',3:'l'}}
df3 = pd.DataFrame(df2_data)
print df3

df4_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
            'name' :{0:'p',1:'q',2:'r',3:'s'}}
df4 = pd.DataFrame(df4_data)
print df4

df5_data = {'sym2' :{0:'abc',1:'xxx',2:'xyz',3:'mno'},
            'name' :{0:'w',1:'x',2:'y',3:'z'}}
df5 = pd.DataFrame(df5_data)
print df5

In column sym2 available in Dataframes df2,df3,df4,df5 may contain same symbols or may not contain same symbols. My intension is to check sym1 column values available or not in df2,df3,df4,df5 dataframes sym2 column values or not?

Expected output-

    id sym1
0  102  pqr
1  105  lmn

conclusion-

symbols pqr and lmn not available in sym2 column of dataframes df2,df3,df4 and df5.

  • use isin to check if each element of df1.sym1 is in some other iterable
  • use pd.concat to string all other dataframes together

df1[~df1.sym1.isin(pd.concat([df2, df3, df4, df5]).sym2)]

    id sym1
1  102  pqr
4  105  lmn

numpy variant that is 3 times quicker

df1[~df1.sym1.isin(np.concatenate([d.sym2.values for d in [df2, df3, df4, df5]]))]

Another solution for comparing with merge and parameter indicator :

dfs = [df2,df3,df4,df5]
df = pd.concat(dfs, keys=['df2','df3','df4','df5'])
print (df)
      name sym2
df2 0    a  abc
    1    b  xxx
    2    c  xyz
    3    d  mno
df3 0    a  abc
    1    b  xxx
    2    c  xyz
    3    d  mno
df4 0    p  abc
    1    q  xxx
    2    r  xyz
    3    s  mno
df5 0    w  abc
    1    x  xxx
    2    y  xyz
    3    z  mno
merged = pd.merge(df.rename_axis(['dfs','idx']).reset_index(), 
                  df1, 
                  left_on='sym2', 
                  right_on='sym1', 
                  how='outer', 
                  indicator=True)
print (merged)
    dfs  idx name sym2   id sym1      _merge
0   df2  0.0    a  abc  101  abc        both
1   df3  0.0    a  abc  101  abc        both
2   df4  0.0    p  abc  101  abc        both
3   df5  0.0    w  abc  101  abc        both
4   df2  1.0    b  xxx  NaN  NaN   left_only
5   df3  1.0    b  xxx  NaN  NaN   left_only
6   df4  1.0    q  xxx  NaN  NaN   left_only
7   df5  1.0    x  xxx  NaN  NaN   left_only
8   df2  2.0    c  xyz  103  xyz        both
9   df3  2.0    c  xyz  103  xyz        both
10  df4  2.0    r  xyz  103  xyz        both
11  df5  2.0    y  xyz  103  xyz        both
12  df2  3.0    d  mno  104  mno        both
13  df3  3.0    d  mno  104  mno        both
14  df4  3.0    s  mno  104  mno        both
15  df5  3.0    z  mno  104  mno        both
16  NaN  NaN  NaN  NaN  102  pqr  right_only
17  NaN  NaN  NaN  NaN  105  lmn  right_only

print (merged.loc[merged['_merge']=='right_only', ['id','sym1']])
     id sym1
16  102  pqr
17  105  lmn

print (merged.loc[merged['_merge']=='left_only', ['dfs', 'sym2']])
   dfs sym2
4  df2  xxx
5  df3  xxx
6  df4  xxx
7  df5  xxx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM