簡體   English   中英

創建引用其自身先前值的列的有效方法

[英]Efficient way to create column referencing its own previous value

我正在嘗試根據引用其自身先前值的規則在具有日期時間索引的數據框中生成一些列。 我已經嘗試了以下df長度的for循環,但是如果可能的話尋找更干凈的解決方案?

因為我最后要做的是獲取大量A,B ...上生成的列的統計信息(在下面的示例中為C,D,E)。

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(30, 2), columns=list('AB'))
reset_level = 0.5
df['diff'] = df['A'].diff()
df['C'], df['D'], df['E'] = [0.0, 0.0, 0.0]

for i in range(1,len(df)):
    if abs(df.iloc[i-1]['C'] + df.iloc[i]['diff']) > (reset_level):
        df.iat[i,3] = 0.000
        df.iat[i,4] = (df.iloc[i-1]['C'] + df.iloc[i]['diff'])
    else:
        df.iat[i,3] = (df.iloc[i-1]['C'] + df.iloc[i]['diff'])
        df.iat[i,4] = 0.000 
    df.iat[i,5] = 0.5 * df.iloc[i]['D'] * df.iloc[i]['D'] 

編輯:在下面添加預期的輸出

         A        B         diff        C             D                 E
0   -0.352725   1.429037    NaN         0.000000    0.000000    0.000000
1   -1.024418   -0.644302   -0.671693   0.000000    -0.671693   0.225585
2   0.401065    0.419555    1.425483    0.000000    1.425483    1.016001
3   -1.302484   0.724320    -1.703549   0.000000    -1.703549   1.451039
4   0.427035    0.835221    1.729518    0.000000    1.729518    1.495617
5   0.158694    -0.416741   -0.268340   -0.268340   0.000000    0.000000
6   0.921985    -0.490635   0.763291    0.494951    0.000000    0.000000
7   -0.835297   -1.036580   -1.757282   0.000000    -1.262331   0.796740
8   0.752060    -0.279206   1.587356    0.000000    1.587356    1.259850
9   1.795306    -1.554886   1.043246    0.000000    1.043246    0.544181
10  -0.405100   -0.361454   -2.200406   0.000000    -2.200406   2.420893
11  -0.253629   -0.627245   0.151471    0.151471    0.000000    0.000000
12  -0.820573   -0.212886   -0.566944   -0.415473   0.000000    0.000000
13  0.473439    2.532487    1.294012    0.000000    0.878539    0.385916
14  -1.395435   1.016338    -1.868875   0.000000    -1.868875   1.746346
15  -0.244269   -0.337820   1.151166    0.000000    1.151166    0.662592
16  -2.084977   -1.262249   -1.840708   0.000000    -1.840708   1.694103
17  0.666323    -1.696245   2.751300    0.000000    2.751300    3.784825
18  0.235207    -0.513903   -0.431115   -0.431115   0.000000    0.000000
19  1.386456    -0.149153   1.151249    0.000000    0.720134    0.259296
20  0.093456    -0.298154   -1.293000   0.000000    -1.293000   0.835925
21  0.690499    -1.687416   0.597043    0.000000    0.597043    0.178230
22  1.287530    -1.390260   0.597031    0.000000    0.597031    0.178223
23  1.828138    -0.288829   0.540608    0.000000    0.540608    0.146128
24  0.209666    -0.903385   -1.618472   0.000000    -1.618472   1.309727
25  -1.010678   0.615569    -1.220344   0.000000    -1.220344   0.744619
26  -1.799800   1.536332    -0.789122   0.000000    -0.789122   0.311357
27  0.611096    -1.033066   2.410896    0.000000    2.410896    2.906209
28  -0.532675   -0.091541   -1.143770   0.000000    -1.143770   0.654105
29  2.468137    -1.046117   3.000811    0.000000    3.000811    4.502435

我使用numpy數組轉換了for循環來保存條件,然后根據您的條件將np.where替換為值:

  1. 定義條件數組
condition = np.abs(df.C.shift() + df["diff"]) > reset_level
  1. 根據條件替換值
df.iloc[:, 3] = np.where(condition, np.zeros((df.shape[0])), (df['C'].shift() + df['diff']))
df.iloc[:, 4] = np.where(~condition, np.zeros((df.shape[0])), (df['C'].shift() + df['diff']))

df.iloc[:, 5] = 0.5 * df['D'] * df['D']

輸出:

           A         B      diff         C         D         E
0  -0.432513 -0.259526       NaN       NaN  0.000000  0.000000
1  -1.120872 -1.572850 -0.688360  0.000000       NaN       NaN
2  -0.917555 -2.251316  0.203317  0.203317  0.000000  0.000000
3  -1.869781 -1.284524 -0.952225  0.000000 -0.748908  0.280432
4  -2.041950 -0.091837 -0.172169 -0.172169  0.000000  0.000000
5  -0.142499  0.207746  1.899451  0.000000  1.727282  1.491751
6   1.432833  0.085211  1.575332  0.000000  1.575332  1.240835
7  -2.500191 -0.009907 -3.933025  0.000000 -3.933025  7.734341
8   0.154460 -1.859954  2.654651  0.000000  2.654651  3.523587
9  -0.565057 -0.516736 -0.719517  0.000000 -0.719517  0.258853
10  0.329845  0.127978  0.894902  0.000000  0.894902  0.400425
11 -0.920558  1.254617 -1.250402  0.000000 -1.250402  0.781753
12 -1.396913  0.262378 -0.476355 -0.476355  0.000000  0.000000
13  0.117336 -0.439932  1.514249  0.000000  1.037894  0.538612
14 -0.227066  2.565831 -0.344402 -0.344402  0.000000  0.000000
15  0.077750  0.195277  0.304816  0.304816  0.000000  0.000000
16  1.470611 -0.357213  1.392861  0.000000  1.697677  1.441053
17 -0.553844  0.339270 -2.024455  0.000000 -2.024455  2.049209
18 -0.259603  0.212839  0.294242  0.294242  0.000000  0.000000
19  0.605961  0.279599  0.865564  0.000000  1.159805  0.672574
20 -0.326706 -0.774350 -0.932667  0.000000 -0.932667  0.434934
21 -0.927601 -2.360751 -0.600895  0.000000 -0.600895  0.180537
22 -0.372085  0.986228  0.555516  0.000000  0.555516  0.154299
23 -0.687731 -2.966817 -0.315647 -0.315647  0.000000  0.000000
24 -0.041028 -0.328898  0.646703  0.000000  0.331057  0.054799
25  0.099489  0.275983  0.140517  0.140517  0.000000  0.000000
26  0.468274 -0.287097  0.368785  0.368785  0.000000  0.000000
27  0.497417 -0.588481  0.029143  0.029143  0.000000  0.000000
28  0.603178  2.243163  0.105761  0.105761  0.000000  0.000000
29 -0.643283 -1.051491 -1.246461  0.000000 -1.140700  0.650598

這是您要找的內容,沒有提供預期的輸出。

說明文件:

嘗試這一步(但不要遍歷所有行-它將立即為您完成整列):

df["C_prev"] = df["C"].shift(1)
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(30, 2), columns=list('AB'))
reset_level = 0.5
df['diff'] = df['A'].diff()
df['C'], df['D'], df['E'] = [0.0, 0.0, 0.0]

然后將函數應用於每一行:

def f(row):
    if abs(df.loc[row.name - 1, 'C'] + row['diff']) > reset_level:
        C = 0.0
        D = df.loc[row.name - 1, 'C'] + row['diff']
    else:
        C = df.loc[row.name - 1, 'C'] + row['diff']
        D = 0.0
    E = 0.5 * row['D'] * row['D'] 
    return(pd.Series([C, D, E]))

df.loc[1:, ['C', 'D', 'E']] = df[1:].apply(f, axis=1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM