简体   繁体   English

使用先前计算的值(来自同一列)和Pandas Dataframe中另一列的值来计算值

[英]Calculate value using previously-calculated value (from the same column) and value from another column in a Pandas Dataframe

After hours trying to learn how to do this, I'm reaching out to the community. 经过数小时试图学习如何做的事情,我正在与社区联系。

I'm starting with the following: 我从以下内容开始:

                perf
date                
2018-06-01  0.012923
2018-06-02  0.039364
2018-06-03  0.042805
2018-06-04 -0.033214
2018-06-05 -0.021745

Need to calculate a cumulative percentage change on a new column but need to ensure the calculation uses 100 as the starting value. 需要在新列上计算累计百分比变化,但需要确保计算使用100作为起始值。 So I prepend a single row with the 100: 因此,我在单行前面加上了100:

                perf  pct_change
date                            
2018-05-31       NaN       100.0
2018-06-01  0.012923         NaN
2018-06-02  0.039364         NaN
2018-06-03  0.042805         NaN
2018-06-04 -0.033214         NaN

What I need to get is this: 我需要得到的是:

                perf  pct_change
date                            
2018-05-31       NaN       100.0
2018-06-01  0.012923    101.2923
2018-06-02  0.039364 105.2795701
2018-06-03  0.042805 109.7860621
2018-06-04 -0.033214 106.1396278

The formula being something like pct_change = previous_days_pct_change * ( 1 + perf ) 公式类似于pct_change = previous_days_pct_change * ( 1 + perf )

I tried a few different approaches including a for ... in loop with no success. 我尝试了几种不同的方法,包括for ... in循环,均未成功。

# INCOMPLETE/DOES NOT WORK (adding for illustration purposes only)
for index, row in performance.iterrows():
    curr = performance.loc[index, 'perf']
    pidx = index + pd.DateOffset(-1)
    prev = performance.iloc[[pidx], 'pct_change']
    performance.loc[index, 'pct_change'] = prev * ( 1 + curr )

I also tried: 我也尝试过:

performance['pct_change'] = performance['pct_change'].shift() * ( 1 + performance['perf'] )

Which yields: 产生:

                perf  pct_change
date                            
2018-05-31       NaN         NaN
2018-06-01  0.012923  101.292251
2018-06-02  0.039364         NaN
2018-06-03  0.042805         NaN
2018-06-04 -0.033214         NaN

But that only gives me the one value. 但这只是给我一个价值。

I suspect there is already a much simpler way to do what I'm trying to do but I'm just not finding it. 我怀疑已经有一种更简单的方法可以完成我想做的事情,但是我只是找不到。 Any help would be appreciated. 任何帮助,将不胜感激。 Very easy to do in a spreadsheet but I want to learn how to do this in Pandas. 在电子表格中非常容易做到,但是我想学习如何在Pandas中做到这一点。

Thank you 谢谢

Using cumprod : 使用cumprod

df['pct_change'] = (df['perf']+1).cumprod() * 100

achieves what you actually want: 实现您真正想要的:

pct_change_0 = (perf_0 + 1) * 100
pct_change_1 = pct_change_0 * (perf_1 + 1) = (perf_0 + 1) * (perf_1 + 1) *  100
pct_change_2 = pct_change_1 * (perf_2 + 1) = (perf_0 + 1) * (perf_1 + 1) * (perf_2 + 1) * 100
...

So you are actually computing the cumulative product of perf values (or to be more accurate perf + 1 values). 因此,您实际上是在计算perf值的累积乘积(或更准确地说是perf + 1值)。

Like so: 像这样:

dates = ['2018-06-01', '2018-06-02', '2018-06-03', '2018-06-04', '2018-06-05']
import datetime as dt
dates = [pd.datetime.date(dt.datetime.strptime(x, "%Y-%m-%d")) for x in dates]
perfs = [0.012923, 0.039364, 0.042805, -0.033214, -0.021745]
df = pd.DataFrame({'perf': perfs}, index=dates)

# The important bit:
df['pct_change'] = ((df['perf'] + 1).cumprod() * 100)

df
#                 perf  pct_change
# 2018-06-01  0.012923  101.292300
# 2018-06-02  0.039364  105.279570
# 2018-06-03  0.042805  109.786062
# 2018-06-04 -0.033214  106.139628
# 2018-06-05 -0.021745  103.831622

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从另一个数据帧中的列值替换pandas数据帧中的列中的值 - Replacing value in a column in pandas dataframe from a column value in another dataframe 如何根据计算的同一列中的先前值计算 pandas 列? - How to calculate a pandas column based on the previous value in the same column that is calculated? 用从数据框中另一列的最大值计算的值替换字符串 - Replacing string with value calculated from the max of another column in a dataframe pyspark - 根据另一个计算列的计算值更新列 - pyspark - Updating a column based on a calculated value from another calculated column 熊猫列值从另一个数据框值更新 - pandas column value update from another dataframe value Pandas:将从 DataFrame 中提取的值乘以另一个 DataFrame 中的列值 - Pandas: Multiplying a value extracted from a DataFrame to column values in another DataFrame 使用先前计算的值在pandas DataFrame中创建一列 - Create a column in a pandas DataFrame using the previously computed value 使用来自另一个具有条件的数据帧的值更新熊猫数据帧列 - update pandas dataframe column with value from another dataframe with condition 如何使用来自另一个 dataframe 列的值填充 pandas dataframe 列 - How to fill a pandas dataframe column using a value from another dataframe column 如果数据框中的另一列使用pandas匹配某个值,则从数据框中的列中减去值 - substract values from column in dataframe if another column in dataframe matches some value using pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM