How to Compare Values of two Dataframes in Pandas?

Question

I have two dataframes df and df2 like this

    id  initials
0   100 J
1   200 S
2   300 Y

    name  initials
0   John   J
1   Smith  S
2   Nathan N

I want to compare the values in the initials columns found in ( df and df2 ) and copy the name (in df2 ) which its initial is matching to the initial in the first dataframe ( df )

import pandas as pd

for i in df.initials:
    for j in df2.initials:
        if i == j:
        # copy the name value of this particular initial to df

The output should be like this:

     id name
 0   100 Johon
 1   200 Smith
 2   300

Any idea how to solve this problem?

Answer 1

df1 
    id initials                                                                                                                                                               
0  100        J                                                                                                                                                               
1  200        S                                                                                                                                                               
2  300        Y 

df2
     name initials                                                                                                                                                            
0    John        J                                                                                                                                                            
1   Smith        S                                                                                                                                                            
2  Nathan        N

Use Boolean masks: df2.initials==df1.initials will tell you which values in the two initials columns are the same.

0     True                                                                                                                                                                    
1     True                                                                                                                                                                    
2    False

Use this mask to create a new column:

df1['name'] = df2.name[df2.initials==df1.initials]

Remove the initials column in df1 :

df1.drop('initials', axis=1)

Replace the NaN using fillna(' ')

df1.fillna('', inplace=True) #inplace to avoid creating a copy

    id   name                                                                                                                                                                 
0  100   John                                                                                                                                                                 
1  200  Smith                                                                                                                                                                 
2  300

Answer 2

How about?:

df3 = df.merge(df2,on='initials',
                   how='outer').drop(['initials'],axis=1).dropna(subset=['id'])
>>> df3
      id    name
0  100.0    John
1  200.0   Smith
2  300.0     NaN

So the 'initials' column is dropped and so is anything with np.nan in the 'id' column.

If you don't want the np.nan in there tack on a .fillna() :

df3 = df.merge(df2,on='initials',
                   how='outer').drop(['initials'],axis=1).dropna(subset=['id']).fillna('')
>>> df3
      id   name
0  100.0   John
1  200.0  Smith
2  300.0

How to Compare Values of two Dataframes in Pandas?

Question

2 answers

solution1
3 2016-07-29 08:48:28

solution2
2 ACCPTED 2016-07-29 04:01:46

How to Compare Values of two Dataframes in Pandas?

Question

2 answers

solution1 3 2016-07-29 08:48:28

solution2 2 ACCPTED 2016-07-29 04:01:46

solution1
3 2016-07-29 08:48:28

solution2
2 ACCPTED 2016-07-29 04:01:46