简体   繁体   中英

Vectorised solution to fill rows of a column based on value from previous row in Python

Dataframe:

df = pd.DataFrame({"X":[1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})

print(df)
     X
0  1.0
1  NaN
2  NaN
3  NaN
4  NaN
5  NaN

I want to populate np.nan by taking squares of values from previous row and adding one to it.

Desired output:

         X
0       1.0
1       2.0
2       5.0
3      26.0
4     677.0
5  458330.0

This can be done by a for loop with:

for i in range(1,len(df)):
    df["X"].iloc[i] = ((df["X"].iloc[i-1]) ** 2) + 1

But looking for a vectorised solution of the same problem

Unfortunately vecorized solution is problematic, because is used previous output value. For improve performance is used numba :

@jit(nopython=True)
def f(a):
    for i in range(1, a.shape[0]):
        a[i] = a[i-1] ** 2 + 1
    return a

df['X'] = f(df['X'].to_numpy())
print (df)
                X
0    1.000000e+00
1    2.000000e+00
2    5.000000e+00
3    2.600000e+01
4    6.770000e+02
5    4.583300e+05
6    2.100664e+11
7    4.412789e+22
8    1.947270e+45
9    3.791862e+90
10  1.437822e+181

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM