I'm working with an order process data set. Which contains two columns, Order_ID and Transaction_Phase. In the order process, there can be a few steps before an order is first booked and after it is booked.
In my current problem, I want to keep all the rows until it hits approved. Any other rows after the approval should be dropped. I am only interested in what happened until the approval so I don't need any information following the approval.
Order_ID Tranaction_Phase
529334333 Quote
529334333 Deal approved
529334333 Rejected deal
470660845 Quote
470660845 Deal approved
470660845 Reject Deal
I want my output to look like the following:
Order_ID Tranaction_Phase
529334333 Quote
529334333 Deal approved
4706608452 Quote
4706608452 Deal approved
Can anyone help steer me in the right direction: Packages, logic, documentation etc. I am using python technologies to accomplish this.
df[df.index<=df.groupby('Order_ID')['Tranaction_Phase'].transform(lambda x:x.index[x=='Dealapproved'])]
Out[649]:
Order_ID Tranaction_Phase
0 529334333 Quote
1 529334333 Dealapproved
3 470660845 Quote
4 470660845 Dealapproved
In [36]: df.groupby('Order_ID', group_keys=False) \
.apply(lambda x: x.loc[:x['Tranaction_Phase'].eq('Deal approved').idxmax()])
Out[36]:
Order_ID Tranaction_Phase
3 470660845 Quote
4 470660845 Deal approved
0 529334333 Quote
1 529334333 Deal approved
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.