简体   繁体   中英

Calculation new column in pandas dataframe from row by row calculation

I am learning python and have come up with a way to calculate values row by row, but I am sure there is a more elegant (and quicker) solution. Here is simple example:

df = pd.DataFrame(np.random.rand(10,3), columns=list('abc'))
df.head()

    a   b   c
0   0.207455    0.257266    0.453369
1   0.518193    0.816898    0.141986
2   0.430085    0.490554    0.797655
3   0.369860    0.251664    0.777059
4   0.390059    0.983218    0.966202

df['d']=''
df['e']=''
for i in range(1,len(df)):
    df['d'][i]= sqrt((df['a'][i]-df['b'][i])**2+(df['a'][i-1]-df['b'][i-1])**2)
    df['e'][i]= (df['c'][i]-df['c'][i-1])*1609
    
df.head()

a   b   c   d   e
0   0.207455    0.257266    0.453369        
1   0.518193    0.816898    0.141986    0.30283 -501.015
2   0.430085    0.490554    0.797655    0.304765    1054.97
3   0.369860    0.251664    0.777059    0.132766    -33.1396
4   0.390059    0.983218    0.966202    0.60482 304.331

Is there a better way to do this? I am working with some large datasets and it takes a while to run it this way.

Yes we have shift with diff and no for loop

df['d'] = ((df['a'] - df['b']) ** 2 + (df['a'].shift() - df['b'].shift()) ** 2)**0.5
df['e'] = (df['c'].diff()) * 1609
df
          a         b         c         d            e
0  0.207455  0.257266  0.453369       NaN          NaN
1  0.518193  0.816898  0.141986  0.302830  -501.015247
2  0.430085  0.490554  0.797655  0.304764  1054.971421
3  0.369860  0.251664  0.777059  0.132766   -33.138964
4  0.390059  0.983218  0.966202  0.604821   304.331087

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM