[英]Creating new column in a Pandas df, where each row's value depends on the value of a different column in the row immediately above it
Assume the following Pandas df:假设以下 Pandas df:
# Import dependency.
import pandas as pd
# Create data for df.
data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
}
# Create DataFrame
df = pd.DataFrame(data)
display(df)
I want to add a new column to the df called 'Placeholder.'我想在 df 中添加一个名为“Placeholder”的新列。 The value of Placeholder would be based on the 'Dummy_Variable' column based on the following rules:
Placeholder 的值将基于基于以下规则的“Dummy_Variable”列:
The desired result is a df with new 'Placeholder' column that looks like the df generated by running the code below:所需的结果是一个带有新“占位符”列的 df,它看起来像通过运行以下代码生成的 df:
desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}
df1 = pd.DataFrame(desired_data)
display(df1)
I can do this easily in Excel, but I cannot figure out how to do it in Pandas without using a loop.我可以在 Excel 中轻松地做到这一点,但我无法弄清楚如何在 Pandas 中不使用循环来做到这一点。 Any help is greatly appreciated.
任何帮助是极大的赞赏。 Thanks!
谢谢!
You can use np.where for this:您可以为此使用np.where :
import pandas as pd
import numpy as np
data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1]
}
df = pd.DataFrame(data)
df['Placeholder'] = np.where((df.Dummy_Variable.cumsum() == 0) | (df.Dummy_Variable == 1), df.Value, np.nan)
# now forward fill the remaining NaNs
df['Placeholder'].fillna(method='ffill', inplace=True)
df
Value Dummy_Variable Placeholder
0 1000 0 1000.0
1 1020 0 1020.0
2 1011 1 1011.0
3 1010 0 1011.0
4 1030 0 1011.0
5 950 0 1011.0
6 1001 1 1001.0
7 1100 0 1001.0
8 1121 1 1121.0
9 1131 1 1131.0
# check output:
desired_data = {'Value': [1000, 1020, 1011, 1010, 1030, 950, 1001, 1100, 1121, 1131],
'Dummy_Variable': [0,0,1,0,0,0,1,0,1,1],
'Placeholder': [1000,1020,1011,1011,1011,1011,1001,1001,1121,1131]}
df1 = pd.DataFrame(desired_data)
check = df['Placeholder'] == df1['Placeholder']
check.sum()==len(df1)
# True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.