Dropping rows on a condition

Question

I'm working with an order process data set. Which contains two columns, Order_ID and Transaction_Phase. In the order process, there can be a few steps before an order is first booked and after it is booked.

In my current problem, I want to keep all the rows until it hits approved. Any other rows after the approval should be dropped. I am only interested in what happened until the approval so I don't need any information following the approval.

 Order_ID         Tranaction_Phase 
 529334333         Quote 
 529334333         Deal approved 
 529334333         Rejected deal 
 470660845         Quote
 470660845         Deal approved 
 470660845         Reject Deal

I want my output to look like the following:

 Order_ID         Tranaction_Phase 
 529334333         Quote 
 529334333         Deal approved 
 4706608452        Quote
 4706608452        Deal approved

Can anyone help steer me in the right direction: Packages, logic, documentation etc. I am using python technologies to accomplish this.

Answer 1

df[df.index<=df.groupby('Order_ID')['Tranaction_Phase'].transform(lambda x:x.index[x=='Dealapproved'])]
Out[649]: 
    Order_ID Tranaction_Phase
0  529334333            Quote
1  529334333     Dealapproved
3  470660845            Quote
4  470660845     Dealapproved

Answer 2

In [36]: df.groupby('Order_ID', group_keys=False) \
           .apply(lambda x: x.loc[:x['Tranaction_Phase'].eq('Deal approved').idxmax()])
Out[36]:
    Order_ID Tranaction_Phase
3  470660845            Quote
4  470660845    Deal approved
0  529334333            Quote
1  529334333    Deal approved

Dropping rows on a condition

Question

2 answers

solution1
2 2017-09-21 17:33:34

solution2
1 2017-09-21 17:32:07

Dropping rows on a condition

Question

2 answers

solution1 2 2017-09-21 17:33:34

solution2 1 2017-09-21 17:32:07

solution1
2 2017-09-21 17:33:34

solution2
1 2017-09-21 17:32:07