I have a data frame that looks something like this:
import pandas as pd
df= pd.DataFrame({'ID1':['A','B','C','D','E'],\
'ID2':['B','A','D','C','E'],\
'Account':['94000','94500','94000','18300','94500'],\
'Amount':[100,-100,50,-50,100],\
'Match':['-','-','-','-','-']})
df
I am struggling with the most efficient way to identify when an item in 'ID1' is present in 'ID2' with a particular value of Account. For example, with a condition of Account=94500 should yield:
df= pd.DataFrame({'ID1':['A','B','C','D','E'],\
'ID2':['B','A','D','C','E'],\
'Account':['94000','94500','94000','18300','94500'],\ 'Amount':[100,-100,50,-50,200],'Match':['True','-','-','-','-']})
df
ie only the first row should be tagged because A (in ID2) matches Account 94500
You can use pandas apply :
df['Match'] = df['ID1'].apply(lambda x: any((df['ID2']==x) & (df['Account']=='94500')))
Which gives:
Account Amount ID1 ID2 Match
0 94000 100 A B True
1 94500 -100 B A False
2 94000 50 C D False
3 18300 -50 D C False
4 94500 100 E E True
In words the logic is: "For each element in ID1 ( apply
), check if there is at least ( any
) a row of the dataframe where ID2 = ID1 and Account = 94500"
Your explanation is a bit unclear, but I think you want this:
mask = df[df.Account == '94500'].ID2
df.loc[df.ID1.isin(mask),"Match"] = True
Account Amount ID1 ID2 Match
0 94000 100 A B True
1 94500 -100 B A -
2 94000 50 C D -
3 18300 -50 D C -
4 94500 100 E E True
Also comparing both correct answers just for fun.
%timeit -r 10 df['Match'] = df['ID1'].apply(lambda x: any((df['ID2']==x) & (df['Account']=='94500')))
100 loops, best of 10: 4.21 ms per loop
%timeit -r 10 df.loc[df.ID1.isin(df[df.Account == '94500'].ID2),"Match"] = True
1000 loops, best of 10: 1.48 ms per loop
Update to address a new use case
You mentioned that you have problems where there are two columns you want to use. Again I am not sure if I understood it correctly, but here is my take on it. Suppose you have another variable Prod
and you want to choose both on Account == 94500
and Prod == 6901
.
In this case:
df= pd.DataFrame({'ID1':['A','B','C','D','E'],\
'ID2':['B','A','D','C','E'],\
'Account':['94000','94500','94000','18300','94500'],\
'Amount':[100,-100,50,-50,100],\
'Match':['-','-','-','-','-'],\
'Prod':[0,6901,0,0,0]
})
mask = df[(df.Account == '94500') & (df.Prod == 6901)].ID2
df.loc[df.ID1.isin(mask),"Match"] = True
Result:
Account Amount ID1 ID2 Match Prod
0 94000 100 A B True 0
1 94500 -100 B A - 6901
2 94000 50 C D - 0
3 18300 -50 D C - 0
4 94500 100 E E - 0
Now only 'A' in ID1 matches the condition, since 'A' is in ID2 in 2nd row, so only the first row is selected.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.