简体   繁体   English

在 Pandas df 中创建新列,其中每一行的值取决于其正上方行中不同列的值

[英]Creating new column in a Pandas df, where each row's value depends on the value of a different column in the row immediately above it

Assume the following Pandas df:假设以下 Pandas df:

# Import dependency.
import pandas as pd

# Create data for df.
data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
       }

# Create DataFrame
df = pd.DataFrame(data)
display(df)

I want to add a new column to the df called 'Placeholder.'我想在 df 中添加一个名为“Placeholder”的新列。 The value of Placeholder would be based on the 'Dummy_Variable' column based on the following rules: Placeholder 的值将基于基于以下规则的“Dummy_Variable”列:

  • If all previous rows had a 'Dummy_Variable' value of 0, then the 'Placeholder' value for that row would be equal to the 'Value' for that row.如果所有先前行的“Dummy_Variable”值为 0,则该行的“占位符”值将等于该行的“值”。
  • If the 'Dummy_Variable' value for a row equals 1, then the 'Placeholder' value for that row would be equal to the 'Value' for that row.如果行的“Dummy_Variable”值等于 1,则该行的“占位符”值将等于该行的“值”。
  • If the 'Dummy_Variable' value for a row equals 0 but the 'Placeholder' value for the row immediately above it is >0, then the 'Placeholder' value for the row would be equal to the 'Placeholder' value for the row immediately above it.如果行的“Dummy_Variable”值等于 0,但其正上方行的“Placeholder”值大于 0,则该行的“Placeholder”值将等于正上方行的“Placeholder”值它。

The desired result is a df with new 'Placeholder' column that looks like the df generated by running the code below:所需的结果是一个带有新“占位符”列的 df,它看起来像通过运行以下代码生成的 df:

desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
        'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}

df1 = pd.DataFrame(desired_data)
display(df1)

I can do this easily in Excel, but I cannot figure out how to do it in Pandas without using a loop.我可以在 Excel 中轻松地做到这一点,但我无法弄清楚如何在 Pandas 中不使用循环来做到这一点。 Any help is greatly appreciated.任何帮助是极大的赞赏。 Thanks!谢谢!

You can use np.where for this:您可以为此使用np.where

import pandas as pd
import numpy as np

data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
       }

df = pd.DataFrame(data)

df['Placeholder'] = np.where((df.Dummy_Variable.cumsum() == 0) | (df.Dummy_Variable == 1), df.Value, np.nan)

# now forward fill the remaining NaNs
df['Placeholder'].fillna(method='ffill', inplace=True)

df

   Value  Dummy_Variable  Placeholder
0   1000               0       1000.0
1   1020               0       1020.0
2   1011               1       1011.0
3   1010               0       1011.0
4   1030               0       1011.0
5    950               0       1011.0
6   1001               1       1001.0
7   1100               0       1001.0
8   1121               1       1121.0
9   1131               1       1131.0


# check output:
desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
        'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
        'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}

df1 = pd.DataFrame(desired_data)

check = df['Placeholder'] == df1['Placeholder']
check.sum()==len(df1)
# True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 列,其中每个值取决于另一个 df 查询 - Pandas column where each value depends on another df query Pandas:我想在时间序列中创建一列,其中的值取决于前一行的值 - Pandas: I want to create a column in a Time Series where the value depends on the previous row's value Pandas 列取决于其先前的值(行)? - Pandas column that depends on its previous value (row)? Pandas df 根据与不同列中的行匹配的字典中的值更改一列中的行的值 - Pandas df change the value of a row in one column based on a value in a dictionary matching a row in a different column 如何根据上面的行的值添加新列 - how to add new column based on the above row's value 从pandas中的每一行创建一个新列 - Creating a new column from each row in pandas 如何遍历 Pandas DF 中的列以检查某个值并返回同一行但来自不同列的值? - How to iterate over a column in a Pandas DF to check for a certain value and return a value in the same row but from a different column? 如何为熊猫中的列中的每个逗号分隔值创建一个新行 - How to create a new row for each comma separated value in a column in pandas 如何在数据框中拆分一列并将每个值存储为新行(以熊猫为单位)? - How to split a column in a dataframe and store each value as a new row (in pandas)? 熊猫向组中每一行的新列添加一个值 - pandas add a value to new column to each row in a group
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM