Pandas：创建新列并根据条件使用上一行的值填充

Question

I have the following dataframe:我有以下 dataframe：

df = pd.DataFrame({'KEY': ['1','1','1','1','1','1','1','2','2'], 'DATE': ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08','2020-01-08','2020-02-01','2020-02-01'], 'ENDNO': ['1000','1000','1000','2000','2000','2000','2000','400','400'], 'ITEM': ['PAPERCLIPS','BINDERS','STAPLES','PAPERCLIPS','BINDERS','STAPLES','TAPE','PENCILS','PENS']})

KEY DATE        ENDNO ITEM
1   2020-01-01  1000  PAPERCLIPS
1   2020-01-01  1000  BINDERS   
1   2020-01-01  1000  STAPLES   
1   2020-01-08  2000  PAPERCLIPS
1   2020-01-08  2000  BINDERS   
1   2020-01-08  2000  STAPLES
1   2020-01-08  2000  TAPE
2   2020-02-01  400   PENCILS   
2   2020-02-01  400   PENS

I need to add a new column called "STARTNO" and populate it based on multiple conditions:我需要添加一个名为“STARTNO”的新列并根据多个条件填充它：

if KEY <> KEY of row above, STARTNO = 0
else
   (if DATE = DATE of row above, STARTNO = STARTNO of row above
    else STARTNO = ENDNO of row above)

It should end up looking something like this:它最终应该看起来像这样：

KEY DATE        STARTNO ENDNO ITEM
1   2020-01-01  0       1000  PAPERCLIPS
1   2020-01-01  0       1000  BINDERS   
1   2020-01-01  0       1000  STAPLES   
1   2020-01-08  1000    2000  PAPERCLIPS
1   2020-01-08  1000    2000  BINDERS   
1   2020-01-08  1000    2000  STAPLES
1   2020-01-08  1000    2000  TAPE   
2   2020-02-01  0       400   PENCILS   
2   2020-02-01  0       400   PENS

If I was just evaluating 1 statement, I know I could use lambdas, but I'm not sure how to do a nested statement in Pandas and reference the line above.如果我只是评估 1 条语句，我知道我可以使用 lambda，但我不确定如何在 Pandas 中执行嵌套语句并参考上面的行。

Would someone please point me in the right direction?有人能指出我正确的方向吗？

Thanks!谢谢！

ETA:预计到达时间：

Quang Hoang's answer almost got me what I needed. Quang Hoang 的回答几乎让我得到了我需要的东西。 I realized I missed one aspect of my initial list.我意识到我错过了我最初清单的一个方面。

I've added a new item called "TAPE" and updated the dataframe script above.我添加了一个名为“TAPE”的新项目并更新了上面的 dataframe 脚本。

Applying the groupby clause works well for all items except TAPE.应用 groupby 子句适用于除 TAPE 之外的所有项目。 With TAPE, it puts the STARTNO back at 0;使用 TAPE，它会将 STARTNO 放回 0； however, I actually need the STARTNO to be the same as the ENDNO for the previous items with the same KEY and DATE.但是，对于具有相同 KEY 和 DATE 的先前项目，我实际上需要 STARTNO 与 ENDNO 相同。 If I change the code to:如果我将代码更改为：

df['STARTNO'] = df.groupby(['KEY','DATE'])['ENDNO'].shift(fill_value=0)

it starts the STARTNO back at 0 whenever the date changes, which is incorrect.每当日期更改时，它都会将 STARTNO 重新从 0 开始，这是不正确的。

How do I change the code so that it takes the ENDNO for the previous row when the KEY and DATE match?如何更改代码以便在 KEY 和 DATE 匹配时将 ENDNO 用于上一行？

Answer 1

I think this is groupby().shift() :我认为这是groupby().shift() ：

df['STARTNO'] = df.groupby(['KEY','ITEM'])['ENDNO'].shift(fill_value=0)

Output: Output：

  KEY        DATE ENDNO        ITEM STARTNO
0   1  2020-01-01  1000  PAPERCLIPS       0
1   1  2020-01-01  1000     BINDERS       0
2   1  2020-01-01  1000     STAPLES       0
3   1  2020-01-08  2000  PAPERCLIPS    1000
4   1  2020-01-08  2000     BINDERS    1000
5   1  2020-01-08  2000     STAPLES    1000
6   2  2020-02-01   400     PENCILS       0
7   2  2020-02-01   400        PENS       0

Pandas：创建新列并根据条件使用上一行的值填充

问题描述

1 个解决方案

解决方案1
4 2021-02-09 16:18:25

Pandas：创建新列并根据条件使用上一行的值填充

问题描述

1 个解决方案

解决方案1 4 2021-02-09 16:18:25

解决方案1
4 2021-02-09 16:18:25