I have the following dataframe:
df = pd.DataFrame({'KEY': ['1','1','1','1','1','1','1','2','2'], 'DATE': ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08','2020-01-08','2020-02-01','2020-02-01'], 'ENDNO': ['1000','1000','1000','2000','2000','2000','2000','400','400'], 'ITEM': ['PAPERCLIPS','BINDERS','STAPLES','PAPERCLIPS','BINDERS','STAPLES','TAPE','PENCILS','PENS']})
KEY DATE ENDNO ITEM
1 2020-01-01 1000 PAPERCLIPS
1 2020-01-01 1000 BINDERS
1 2020-01-01 1000 STAPLES
1 2020-01-08 2000 PAPERCLIPS
1 2020-01-08 2000 BINDERS
1 2020-01-08 2000 STAPLES
1 2020-01-08 2000 TAPE
2 2020-02-01 400 PENCILS
2 2020-02-01 400 PENS
I need to add a new column called "STARTNO" and populate it based on multiple conditions:
if KEY <> KEY of row above, STARTNO = 0
else
(if DATE = DATE of row above, STARTNO = STARTNO of row above
else STARTNO = ENDNO of row above)
It should end up looking something like this:
KEY DATE STARTNO ENDNO ITEM
1 2020-01-01 0 1000 PAPERCLIPS
1 2020-01-01 0 1000 BINDERS
1 2020-01-01 0 1000 STAPLES
1 2020-01-08 1000 2000 PAPERCLIPS
1 2020-01-08 1000 2000 BINDERS
1 2020-01-08 1000 2000 STAPLES
1 2020-01-08 1000 2000 TAPE
2 2020-02-01 0 400 PENCILS
2 2020-02-01 0 400 PENS
If I was just evaluating 1 statement, I know I could use lambdas, but I'm not sure how to do a nested statement in Pandas and reference the line above.
Would someone please point me in the right direction?
Thanks!
ETA:
Quang Hoang's answer almost got me what I needed. I realized I missed one aspect of my initial list.
I've added a new item called "TAPE" and updated the dataframe script above.
Applying the groupby clause works well for all items except TAPE. With TAPE, it puts the STARTNO back at 0; however, I actually need the STARTNO to be the same as the ENDNO for the previous items with the same KEY and DATE. If I change the code to:
df['STARTNO'] = df.groupby(['KEY','DATE'])['ENDNO'].shift(fill_value=0)
it starts the STARTNO back at 0 whenever the date changes, which is incorrect.
How do I change the code so that it takes the ENDNO for the previous row when the KEY and DATE match?
I think this is groupby().shift()
:
df['STARTNO'] = df.groupby(['KEY','ITEM'])['ENDNO'].shift(fill_value=0)
Output:
KEY DATE ENDNO ITEM STARTNO
0 1 2020-01-01 1000 PAPERCLIPS 0
1 1 2020-01-01 1000 BINDERS 0
2 1 2020-01-01 1000 STAPLES 0
3 1 2020-01-08 2000 PAPERCLIPS 1000
4 1 2020-01-08 2000 BINDERS 1000
5 1 2020-01-08 2000 STAPLES 1000
6 2 2020-02-01 400 PENCILS 0
7 2 2020-02-01 400 PENS 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.