[英]Pandas: Create new column and populate with value from previous row based on conditions
I have the following dataframe:我有以下 dataframe:
df = pd.DataFrame({'KEY': ['1','1','1','1','1','1','1','2','2'], 'DATE': ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08','2020-01-08','2020-02-01','2020-02-01'], 'ENDNO': ['1000','1000','1000','2000','2000','2000','2000','400','400'], 'ITEM': ['PAPERCLIPS','BINDERS','STAPLES','PAPERCLIPS','BINDERS','STAPLES','TAPE','PENCILS','PENS']})
KEY DATE ENDNO ITEM
1 2020-01-01 1000 PAPERCLIPS
1 2020-01-01 1000 BINDERS
1 2020-01-01 1000 STAPLES
1 2020-01-08 2000 PAPERCLIPS
1 2020-01-08 2000 BINDERS
1 2020-01-08 2000 STAPLES
1 2020-01-08 2000 TAPE
2 2020-02-01 400 PENCILS
2 2020-02-01 400 PENS
I need to add a new column called "STARTNO" and populate it based on multiple conditions:我需要添加一个名为“STARTNO”的新列并根据多个条件填充它:
if KEY <> KEY of row above, STARTNO = 0
else
(if DATE = DATE of row above, STARTNO = STARTNO of row above
else STARTNO = ENDNO of row above)
It should end up looking something like this:它最终应该看起来像这样:
KEY DATE STARTNO ENDNO ITEM
1 2020-01-01 0 1000 PAPERCLIPS
1 2020-01-01 0 1000 BINDERS
1 2020-01-01 0 1000 STAPLES
1 2020-01-08 1000 2000 PAPERCLIPS
1 2020-01-08 1000 2000 BINDERS
1 2020-01-08 1000 2000 STAPLES
1 2020-01-08 1000 2000 TAPE
2 2020-02-01 0 400 PENCILS
2 2020-02-01 0 400 PENS
If I was just evaluating 1 statement, I know I could use lambdas, but I'm not sure how to do a nested statement in Pandas and reference the line above.如果我只是评估 1 条语句,我知道我可以使用 lambda,但我不确定如何在 Pandas 中执行嵌套语句并参考上面的行。
Would someone please point me in the right direction?有人能指出我正确的方向吗?
Thanks!谢谢!
ETA:预计到达时间:
Quang Hoang's answer almost got me what I needed. Quang Hoang 的回答几乎让我得到了我需要的东西。 I realized I missed one aspect of my initial list.
我意识到我错过了我最初清单的一个方面。
I've added a new item called "TAPE" and updated the dataframe script above.我添加了一个名为“TAPE”的新项目并更新了上面的 dataframe 脚本。
Applying the groupby clause works well for all items except TAPE.应用 groupby 子句适用于除 TAPE 之外的所有项目。 With TAPE, it puts the STARTNO back at 0;
使用 TAPE,它会将 STARTNO 放回 0; however, I actually need the STARTNO to be the same as the ENDNO for the previous items with the same KEY and DATE.
但是,对于具有相同 KEY 和 DATE 的先前项目,我实际上需要 STARTNO 与 ENDNO 相同。 If I change the code to:
如果我将代码更改为:
df['STARTNO'] = df.groupby(['KEY','DATE'])['ENDNO'].shift(fill_value=0)
it starts the STARTNO back at 0 whenever the date changes, which is incorrect.每当日期更改时,它都会将 STARTNO 重新从 0 开始,这是不正确的。
How do I change the code so that it takes the ENDNO for the previous row when the KEY and DATE match?如何更改代码以便在 KEY 和 DATE 匹配时将 ENDNO 用于上一行?
I think this is groupby().shift()
:我认为这是
groupby().shift()
:
df['STARTNO'] = df.groupby(['KEY','ITEM'])['ENDNO'].shift(fill_value=0)
Output: Output:
KEY DATE ENDNO ITEM STARTNO
0 1 2020-01-01 1000 PAPERCLIPS 0
1 1 2020-01-01 1000 BINDERS 0
2 1 2020-01-01 1000 STAPLES 0
3 1 2020-01-08 2000 PAPERCLIPS 1000
4 1 2020-01-08 2000 BINDERS 1000
5 1 2020-01-08 2000 STAPLES 1000
6 2 2020-02-01 400 PENCILS 0
7 2 2020-02-01 400 PENS 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.