I have two dataframes with different sizes, df1
and df2
. I'm trying to check if the values from df1
exist in a column in df2
and return True
or False
in new columns in df1
.
The first dataframe is my reference. It's extracted from an xls file.
df1.head(10)
Out[29]:
PO Number Sales Document SO DO Document Number
0 3620556930 9001724124.0 4001458660.0 8001721322.0 1500017748
1 3620556930 9001723883.0 4001458865.0 8001721037.0 1500017540
2 3620556930 9001723884.0 4001459374.0 8001721038.0 1500017541
3 3620556930 9001723885.0 4001458101.0 8001721043.0 1500017542
4 3620547728 9001721907.0 4001457180.0 8001719172.0 1500015786
5 3620556930 9001721908.0 4001457724.0 8001719173.0 1500015787
6 TT030720 nan nan nan 700001897
7 3620518726 9600008914.0 5600008655.0 5600008655.0 1500008725
8 3620518726 9600008912.0 5600008653.0 5600008653.0 1500008723
9 3620518726 9600008913.0 5600008654.0 5600008654.0 1500008724
The second dataframe is from a table I scraped from a website.
df2.head(10)
Out[32]:
PO No Doc Type SUS Doc No GR_GA Inv_SO_DO Doc Date
0 3620556930 Purchase Order 8001294233 CSL 27.08.2020
1 3620556930 Goods Receipt 7903307400 Goods Received 4001457724 04.09.2020
2 3620556930 Goods Receipt 7903307457 Goods Accepted 4001457724 04.09.2020
3 3620556930 Payment Request 3102053949 CCM Invoice 9001721908 23.09.2020
4 3620556930 Goods Receipt 7903333326 Goods Received 4001458660 29.09.2020
5 3620556930 Goods Receipt 7903333325 Goods Received 4001458101 29.09.2020
6 3620556930 Goods Receipt 7903333322 Goods Received 4001458865 29.09.2020
7 3620556930 Goods Receipt 7903333327 Goods Accepted 4001458660 29.09.2020
8 3620556930 Goods Receipt 7903333324 Goods Received 4001458660 29.09.2020
9 3620556930 Goods Receipt 7903333329 Goods Accepted 4001458865 29.09.2020
My thought process in getting the output is as below:
df1
, named df1['GR', 'GA', 'Inv']
.df1['SO']
and df1['DO']
to check if they exist in df2['Inv_SO_DO']
.df2['GR_GA']
if it's a Goods Receipt, Goods Acceptance or Invoice. I'll then return True
or False
in columns df1['GR', 'GA', 'Inv']
depending on this check. I've tried a for
loop as below for to create a list of values to be added for ['GA']
but it just gave me a list of Falses.
ga = []
t1 = x.iloc[:,2].values
t2 = y.iloc[:,4].values
t3 = y.iloc[:,3].values
for i in t1:
for j in t2:
for k in t3:
if i == j and k == 'Goods Receipt':
ga.append('True')
else:
ga.append('False')
The closest I got to a solution is from another question here . I tried the code and modified it but it didn't turn out right as well. Either that, or I'm doing the code from the link wrong.
Any advise would be most welcomed!
Output desired:
df1.head(4)
Out[43]:
PO Number Sales Document SO DO Document Number GR GA Inv
0 3620556930 9001724124.0 4001458660.0 8001721322.0 1500017748 True True True
1 3620556930 9001723883.0 4001458865.0 8001721037.0 1500017540 True False False
2 3620556930 9001723884.0 4001459374.0 8001721038.0 1500017541 False False False
3 3620556930 9001723885.0 4001458101.0 8001721043.0 1500017542 True True False
One way you could do this is the following:
df1
and df2
on either DO
or SO
(from the left) to Inv_SO_DO
(from the right). Note that in your case each SO
value corresponds to multiple rows in df2
, so perhaps you'll need to amend the merging logic a bit (eg latest appearing row in df2
?)GR_GA
column with pd.get_dummies()
and then concatenate it with the columns you need from the merged df, after casting the dummies into boolean
type.For example:
m = pd.concat([df1.merge(df2, left_on='SO', right_on='Inv_SO_DO', how='inner'),
df1.merge(df2, left_on='DO', right_on='Inv_SO_DO', how='inner')
])
desired_cols = ["PO_Number", "Sales_Document", "SO", "DO", "Document_Number", "CSL", "GoodsAccepted", "GoodsReceived"]
pd.concat([m, pd.get_dummies(m['GR_GA']).astype(bool)], axis=1)[desired_cols]
This gives a result as follows:
PO_Number Sales_Document SO DO Document_Number CSL GoodsAccepted GoodsReceived CCMInvoice
0 3620556930 9001724124 4001458660 8001721322 1500017748 False False True False
1 3620556930 9001724124 4001458660 8001721322 1500017748 False True False False
2 3620556930 9001724124 4001458660 8001721322 1500017748 False False True False
3 3620556930 9001723883 4001458865 8001721037 1500017540 False False True False
Again, note that because each SO
and DO
in the example df1
that you provided can match more than 1 rows in df2
, perhaps you will need to add some custom logic on how to merge.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.