I've been searching around but couldn't find the answer I was looking for, so I apologize for asking what I would imagine is a repetitive question.
I have two dataframes - df1 is a list of transaction data and df2 is a sort of key. df1['code'] references a column in df2.
If the code for the transaction found in df1 is in df2, I'd like to append a value to that df1 entry in a new column identifying that the transaction was valid. If the code is not in df2, I'd like to note the opposite in that same new column.
I understand how I might do this with a 'for' loop, but my understanding is I should learn how to use pandas without relying on that.
Thanks in advance for the help!
Use numpy.where()
:
df1['new_col'] = numpy.where(df1['df1_code'].isin(df2['df2_code']), 'VALID', 'INVALID')
Sample DF
>>> import pandas as pd
>>> import numpy as np
>>> df1 = pd.DataFrame({'code':range(5,15), 'transaction':range(10)})
>>> df2 = pd.DataFrame({'code':range(12,22), 'transaction':range(7,17)})
>>> df1
code transaction
0 5 0
1 6 1
2 7 2
3 8 3
4 9 4
5 10 5
6 11 6
7 12 7
8 13 8
9 14 9
>>> df2
code transaction
0 12 7
1 13 8
2 14 9
3 15 10
4 16 11
5 17 12
6 18 13
7 19 14
8 20 15
9 21 16
>>> df1['new_col'] = np.where(df1['code'].isin(df2['code']), 'VALID', 'INVALID')
>>> df1
code transaction new_col
0 5 0 INVALID
1 6 1 INVALID
2 7 2 INVALID
3 8 3 INVALID
4 9 4 INVALID
5 10 5 INVALID
6 11 6 INVALID
7 12 7 VALID
8 13 8 VALID
9 14 9 VALID
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.