简体   繁体   中英

Dropping rows on a condition

I'm working with an order process data set. Which contains two columns, Order_ID and Transaction_Phase. In the order process, there can be a few steps before an order is first booked and after it is booked.

In my current problem, I want to keep all the rows until it hits approved. Any other rows after the approval should be dropped. I am only interested in what happened until the approval so I don't need any information following the approval.

 Order_ID         Tranaction_Phase 
 529334333         Quote 
 529334333         Deal approved 
 529334333         Rejected deal 
 470660845         Quote
 470660845         Deal approved 
 470660845         Reject Deal 

I want my output to look like the following:

 Order_ID         Tranaction_Phase 
 529334333         Quote 
 529334333         Deal approved 
 4706608452        Quote
 4706608452        Deal approved 

Can anyone help steer me in the right direction: Packages, logic, documentation etc. I am using python technologies to accomplish this.

df[df.index<=df.groupby('Order_ID')['Tranaction_Phase'].transform(lambda x:x.index[x=='Dealapproved'])]
Out[649]: 
    Order_ID Tranaction_Phase
0  529334333            Quote
1  529334333     Dealapproved
3  470660845            Quote
4  470660845     Dealapproved
In [36]: df.groupby('Order_ID', group_keys=False) \
           .apply(lambda x: x.loc[:x['Tranaction_Phase'].eq('Deal approved').idxmax()])
Out[36]:
    Order_ID Tranaction_Phase
3  470660845            Quote
4  470660845    Deal approved
0  529334333            Quote
1  529334333    Deal approved

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM