Pandas - check if subset of dataframe is in another dataframe

Question

I have the following dataframe which I'll call 'names':

date       name    code  
6/1/2018   A       5     
6/1/2018   B       5     
7/1/2018   A       5     
7/1/2018   B       5

I have the following df which I need to alter:

date       name    comment   
5/1/2018   A       'Good'    
6/1/2018   A       'Good'    
6/1/2018   B       'Good'    
6/1/2018   C       'Good'    
7/1/2018   A       'Good'    
7/1/2018   B       'Good'

I need to change the comment to 'Bad' if the name isn't in the names dataframe for that date

Right now I have:

df['comment'] = np.where(~df['name'].isin(names['name']), 'Bad', df['comment'])

Though obviously that doesn't work because it doesn't take into account name AND date.

Final output:

date       name    comment   
5/1/2018   A       'Bad'     
6/1/2018   A       'Good'    
6/1/2018   B       'Good'    
6/1/2018   C       'Bad'     
7/1/2018   A       'Good'    
7/1/2018   B       'Good'

The first row was changed because there's no A entry for 5/1 in the names dataframe. The C row was changed because there's no C entry for 6/1 in the names df (or rather no C entry at all).

Note: Both dataframes (names and df) are larger than I've shown, both row and column-wise.

Answer 1

Performant solution using pd.Index.get_indexer :

v = names.set_index(['date', 'name'])
m = v.index.get_indexer(pd.MultiIndex.from_arrays([df.date, df.name])) == -1
df.loc[m, 'comment'] = '\'Bad\''

print(df)
      date name comment
0  5/1/2018    A   'Bad'
1  6/1/2018    A  'Good'
2  6/1/2018    B  'Good'
3  6/1/2018    C   'Bad'
4  7/1/2018    A  'Good'
5  7/1/2018    B  'Good'

Alternatively, do a LEFT OUTER merge , determine missing values in the right DataFrame, and use that to mask rows:

m = df.merge(names, how='left', on=['date', 'name']).code.isna()
df['comment'] = df['comment'].mask(m, '\'Bad\'')

print(df)
       date name comment
0  5/1/2018    A   'Bad'
1  6/1/2018    A  'Good'
2  6/1/2018    B  'Good'
3  6/1/2018    C   'Bad'
4  7/1/2018    A  'Good'
5  7/1/2018    B  'Good'

Answer 2

You can use pd.Index.isin followed by pd.Series.where :

idx_cols = ['date', 'name']
mask = df.set_index(idx_cols).index.isin(names.set_index(idx_cols).index)

df['comment'].where(mask, '\'Bad\'', inplace=True)

print(df)

       date name comment
0  5/1/2018    A   'Bad'
1  6/1/2018    A  'Good'
2  6/1/2018    B  'Good'
3  6/1/2018    C   'Bad'
4  7/1/2018    A  'Good'
5  7/1/2018    B  'Good'

Pandas - check if subset of dataframe is in another dataframe

Question

2 answers

solution1
2 ACCPTED 2018-11-12 17:46:15

solution2
1 2018-11-12 17:49:24

Pandas - check if subset of dataframe is in another dataframe

Question

2 answers

solution1 2 ACCPTED 2018-11-12 17:46:15

solution2 1 2018-11-12 17:49:24

solution1
2 ACCPTED 2018-11-12 17:46:15

solution2
1 2018-11-12 17:49:24