Pandas: Create new column and populate with value from previous row based on conditions

Question

I have the following dataframe:

df = pd.DataFrame({'KEY': ['1','1','1','1','1','1','1','2','2'], 'DATE': ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08','2020-01-08','2020-02-01','2020-02-01'], 'ENDNO': ['1000','1000','1000','2000','2000','2000','2000','400','400'], 'ITEM': ['PAPERCLIPS','BINDERS','STAPLES','PAPERCLIPS','BINDERS','STAPLES','TAPE','PENCILS','PENS']})

KEY DATE        ENDNO ITEM
1   2020-01-01  1000  PAPERCLIPS
1   2020-01-01  1000  BINDERS   
1   2020-01-01  1000  STAPLES   
1   2020-01-08  2000  PAPERCLIPS
1   2020-01-08  2000  BINDERS   
1   2020-01-08  2000  STAPLES
1   2020-01-08  2000  TAPE
2   2020-02-01  400   PENCILS   
2   2020-02-01  400   PENS

I need to add a new column called "STARTNO" and populate it based on multiple conditions:

if KEY <> KEY of row above, STARTNO = 0
else
   (if DATE = DATE of row above, STARTNO = STARTNO of row above
    else STARTNO = ENDNO of row above)

It should end up looking something like this:

KEY DATE        STARTNO ENDNO ITEM
1   2020-01-01  0       1000  PAPERCLIPS
1   2020-01-01  0       1000  BINDERS   
1   2020-01-01  0       1000  STAPLES   
1   2020-01-08  1000    2000  PAPERCLIPS
1   2020-01-08  1000    2000  BINDERS   
1   2020-01-08  1000    2000  STAPLES
1   2020-01-08  1000    2000  TAPE   
2   2020-02-01  0       400   PENCILS   
2   2020-02-01  0       400   PENS

If I was just evaluating 1 statement, I know I could use lambdas, but I'm not sure how to do a nested statement in Pandas and reference the line above.

Would someone please point me in the right direction?

Thanks!

ETA:

Quang Hoang's answer almost got me what I needed. I realized I missed one aspect of my initial list.

I've added a new item called "TAPE" and updated the dataframe script above.

Applying the groupby clause works well for all items except TAPE. With TAPE, it puts the STARTNO back at 0; however, I actually need the STARTNO to be the same as the ENDNO for the previous items with the same KEY and DATE. If I change the code to:

df['STARTNO'] = df.groupby(['KEY','DATE'])['ENDNO'].shift(fill_value=0)

it starts the STARTNO back at 0 whenever the date changes, which is incorrect.

How do I change the code so that it takes the ENDNO for the previous row when the KEY and DATE match?

Answer 1

I think this is groupby().shift() :

df['STARTNO'] = df.groupby(['KEY','ITEM'])['ENDNO'].shift(fill_value=0)

Output:

  KEY        DATE ENDNO        ITEM STARTNO
0   1  2020-01-01  1000  PAPERCLIPS       0
1   1  2020-01-01  1000     BINDERS       0
2   1  2020-01-01  1000     STAPLES       0
3   1  2020-01-08  2000  PAPERCLIPS    1000
4   1  2020-01-08  2000     BINDERS    1000
5   1  2020-01-08  2000     STAPLES    1000
6   2  2020-02-01   400     PENCILS       0
7   2  2020-02-01   400        PENS       0

Pandas: Create new column and populate with value from previous row based on conditions

Question

1 answers

solution1
4 2021-02-09 16:18:25

Pandas: Create new column and populate with value from previous row based on conditions

Question

1 answers

solution1 4 2021-02-09 16:18:25

solution1
4 2021-02-09 16:18:25