简体   繁体   中英

Select the previous row of a certain value in a column

My inventory data contain columns: sale_date , saleID , region , product . Each product in each region can be sold multiple times within that region , but each time has a unique saleID .

Now, I have a list of saleID , I need to select the rows in which the saleID is located. Moreover, I need to find the last saleID (and the whole row) of this product in this region. In other word, I need to group by region and product first, and find the saleID in the list and find the previous row of this saleID in the grouped rows

df
sale_date     saleID       region     product     
2017-05-01    A12           Asia       car1
2018-03-05    B21           Asia       car1
2018-04-01    C23           Asia       car1
2018-09-06    C21           Canada     car2
2019-01-01    E11           Canada     car2
2019-03-02    E23           USA        car3
2019-01-02    G41           USA        car3
2015-02-02    H11           Mexico     car4
2015-02-03    I14           Mexico     car4

Expected output

saleID_list = [B21, E11, I14]
sale_date     saleID       region     product     
2017-05-01    A12           Asia       car1
2018-03-05    B21           Asia       car1
2018-09-06    C21           Canada     car2
2019-01-01    E11           Canada     car2
2015-02-02    H11           Mexico     car4
2015-02-03    I14           Mexico     car4

I know we can select the rows with the listed saleID using

df.loc[df['saleID'].isin(saleID_list)]

But can I find the previous row of the groupby data? I attempted to write something like using shift , but AttributeError: 'DataFrameGroupBy' object has no attribute 'loc'

df.groupby(['region', 'product']).loc[df['saleID'].isin(saleID_list)].shift(1)

Use DataFrameGroupBy.shift for helper Series and test it in Series.isin with original mask for test column saleID chained by |for bitwise OR :

saleID_list = ["B21", "E11", "I14"]

s = df.groupby(['region', 'product'])['saleID'].shift(-1)

df = df[df['saleID'].isin(saleID_list) | s.isin(saleID_list)]
print (df)
    sale_date saleID  region product
0  2017-05-01    A12    Asia    car1
1  2018-03-05    B21    Asia    car1
3  2018-09-06    C21  Canada    car2
4  2019-01-01    E11  Canada    car2
7  2015-02-02    H11  Mexico    car4
8  2015-02-03    I14  Mexico    car4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM