[英]Create New Pandas DataFrame Column with Values using Previous Row
If we have a Pandas DataFrame containing the following values 如果我们的Pandas DataFrame包含以下值
x
date
2017-07-30 1
2017-07-31 2
2017-08-01 3
2017-08-02 4
how can we create a new column y
whose value is calculated using 我们如何创建一个新列y
其值是使用
today's y = 2*(previous day's y) + (today's x)
for the oldest date, y
will be 1
对于最早的日期, y
将为1
Expected Result: 预期结果:
x y
date
2017-07-30 1 1
2017-07-31 2 4
2017-08-01 3 11
2017-08-02 4 26
Attempt: 尝试:
import pandas as pd
d = {
'date': ['2017-07-30', '2017-07-31', '2017-08-01', '2017-08-02'],
'x': [1,2,3,4]
}
df = pd.DataFrame.from_dict(d).set_index('date')
df['y'] = 1
df['y'] = df['y'].shift(1)*2 + df['x']
print(df)
Attempt's Result 尝试结果
x y
date
2017-07-30 1 NaN
2017-07-31 2 4.0
2017-08-01 3 5.0
2017-08-02 4 6.0
IIUC.. cumsum
? cumsum
?
df.x.cumsum()
Out[864]:
date
2017-07-30 1
2017-07-31 3
2017-08-01 6
2017-08-02 10
Name: x, dtype: int64
Updated 更新
n=2
s=n**(np.arange(len(df)))[::-1]
df.x.rolling(window=len(df),min_periods=1).apply(lambda x : sum(x*s[-len(x):]))
Out[894]:
date
2017-07-30 1.0
2017-07-31 4.0
2017-08-01 11.0
2017-08-02 26.0
Name: x, dtype: float64
What you describe is a recursive calculation and in pandas general way to do is to use expanding
objects with a custom function: 您所描述的是递归计算,在熊猫中,一般的做法是使用带有自定义函数的expanding
对象:
from functools import reduce # Python 3
df['x'].expanding().apply(lambda r: reduce(lambda prev, value: 2*prev + value, r))
Out:
date
2017-07-30 1.0
2017-07-31 4.0
2017-08-01 11.0
2017-08-02 26.0
Name: x, dtype: float64
See one of my previous answers for a detailed discussion on performance of expanding
. 有关expanding
性能的详细讨论,请参见我以前的答案之一。 (tl;dr: a for loop is generally better.) (tl; dr:for循环通常会更好。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.