简体   繁体   English

Pandas:创建新列并根据条件使用上一行的值填充

[英]Pandas: Create new column and populate with value from previous row based on conditions

I have the following dataframe:我有以下 dataframe:

df = pd.DataFrame({'KEY': ['1','1','1','1','1','1','1','2','2'], 'DATE': ['2020-01-01','2020-01-01','2020-01-01','2020-01-08','2020-01-08','2020-01-08','2020-01-08','2020-02-01','2020-02-01'], 'ENDNO': ['1000','1000','1000','2000','2000','2000','2000','400','400'], 'ITEM': ['PAPERCLIPS','BINDERS','STAPLES','PAPERCLIPS','BINDERS','STAPLES','TAPE','PENCILS','PENS']})

KEY DATE        ENDNO ITEM
1   2020-01-01  1000  PAPERCLIPS
1   2020-01-01  1000  BINDERS   
1   2020-01-01  1000  STAPLES   
1   2020-01-08  2000  PAPERCLIPS
1   2020-01-08  2000  BINDERS   
1   2020-01-08  2000  STAPLES
1   2020-01-08  2000  TAPE
2   2020-02-01  400   PENCILS   
2   2020-02-01  400   PENS      

I need to add a new column called "STARTNO" and populate it based on multiple conditions:我需要添加一个名为“STARTNO”的新列并根据多个条件填充它:

if KEY <> KEY of row above, STARTNO = 0
else
   (if DATE = DATE of row above, STARTNO = STARTNO of row above
    else STARTNO = ENDNO of row above)

It should end up looking something like this:它最终应该看起来像这样:

KEY DATE        STARTNO ENDNO ITEM
1   2020-01-01  0       1000  PAPERCLIPS
1   2020-01-01  0       1000  BINDERS   
1   2020-01-01  0       1000  STAPLES   
1   2020-01-08  1000    2000  PAPERCLIPS
1   2020-01-08  1000    2000  BINDERS   
1   2020-01-08  1000    2000  STAPLES
1   2020-01-08  1000    2000  TAPE   
2   2020-02-01  0       400   PENCILS   
2   2020-02-01  0       400   PENS      

If I was just evaluating 1 statement, I know I could use lambdas, but I'm not sure how to do a nested statement in Pandas and reference the line above.如果我只是评估 1 条语句,我知道我可以使用 lambda,但我不确定如何在 Pandas 中执行嵌套语句并参考上面的行。

Would someone please point me in the right direction?有人能指出我正确的方向吗?

Thanks!谢谢!

ETA:预计到达时间:

Quang Hoang's answer almost got me what I needed. Quang Hoang 的回答几乎让我得到了我需要的东西。 I realized I missed one aspect of my initial list.我意识到我错过了我最初清单的一个方面。

I've added a new item called "TAPE" and updated the dataframe script above.我添加了一个名为“TAPE”的新项目并更新了上面的 dataframe 脚本。

Applying the groupby clause works well for all items except TAPE.应用 groupby 子句适用于除 TAPE 之外的所有项目。 With TAPE, it puts the STARTNO back at 0;使用 TAPE,它会将 STARTNO 放回 0; however, I actually need the STARTNO to be the same as the ENDNO for the previous items with the same KEY and DATE.但是,对于具有相同 KEY 和 DATE 的先前项目,我实际上需要 STARTNO 与 ENDNO 相同。 If I change the code to:如果我将代码更改为:

df['STARTNO'] = df.groupby(['KEY','DATE'])['ENDNO'].shift(fill_value=0)

it starts the STARTNO back at 0 whenever the date changes, which is incorrect.每当日期更改时,它都会将 STARTNO 重新从 0 开始,这是不正确的。

How do I change the code so that it takes the ENDNO for the previous row when the KEY and DATE match?如何更改代码以便在 KEY 和 DATE 匹配时将 ENDNO 用于上一行?

I think this is groupby().shift() :我认为这是groupby().shift()

df['STARTNO'] = df.groupby(['KEY','ITEM'])['ENDNO'].shift(fill_value=0)

Output: Output:

  KEY        DATE ENDNO        ITEM STARTNO
0   1  2020-01-01  1000  PAPERCLIPS       0
1   1  2020-01-01  1000     BINDERS       0
2   1  2020-01-01  1000     STAPLES       0
3   1  2020-01-08  2000  PAPERCLIPS    1000
4   1  2020-01-08  2000     BINDERS    1000
5   1  2020-01-08  2000     STAPLES    1000
6   2  2020-02-01   400     PENCILS       0
7   2  2020-02-01   400        PENS       0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:创建新列,根据条件从另一行查找和选择值 - Pandas: Create new column that finds & selects value from another row based on conditions 如何使用基于上一行和下一行的条件在 Pandas Dataframe 上创建新列? - How can I create a new column on a Pandas Dataframe with conditions based on previous and next row? 根据条件和前一行值从其他列填充 Pandas Dataframe 列 - Populate Pandas Dataframe column from other columns based on a condition and previous row value 如何根据多个条件根据前一行填充 pandas dataframe 列的行? - How to populate rows of pandas dataframe column based with previous row based on a multiple conditions? 如何根据 Pandas dataframe 中上一行的行值创建新列? - How to create a new column based on row value in previous row in Pandas dataframe? 根据 pandas 中的前一行创建新的平均列 - Create new average column based on previous row in pandas 根据上一行的值在熊猫数据框中创建一个新列 - Create a new column in a pandas dataframe based on values found on a previous row 根据上一个行值创建一个新列并删除当前行 - Create a new column based on previous row value and delete the current row 根据另一行的条件在 Pandas dataframe 中创建新列的最佳方法是什么? - What is the optimal way to create a new column in Pandas dataframe based on conditions from another row? 根据条件在 Pandas DataFrame 中创建新行 - Create new row in Pandas DataFrame based on conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM