简体   繁体   中英

Update multiple columns per row with loop through pandas dataframe

I've reviewed several posts on here about better ways to loop through dataframes, but can't seem to figure out how to apply them to my specific situation.

I have a dataframe of about 2M rows and I need to calculate six statistics for each row, one per column. There are 3 columns so 18 total. However, the issue is that I need to update those stats using a sample of the dataframe so that the mean/median, etc is different per row.

Here's what I have so far:

r = 0
for i in imputed_df.iterrows():
    t = imputed_df.sample(n=10)
    for (columnName) in cols:
        imputed_df.loc[r,columnName + '_mean'] = t[columnName].mean()
        imputed_df.loc[r,columnName + '_var'] = t[columnName].var()
        imputed_df.loc[r,columnName + '_std'] = t[columnName].std()
        imputed_df.loc[r,columnName + '_skew'] = t[columnName].skew()
        imputed_df.loc[r,columnName + '_kurt'] = t[columnName].kurt()
        imputed_df.loc[r,columnName + '_med'] = t[columnName].median()

But this has been running for two days without finishing. I tried to take a subset of 2000 rows from the original dataframe and even that one has been running for hours.

Is there a better way to do this?

EDIT: Added a sample dataset of what it should look like. each suffixed column should have the calculated value of the subset of 10 rows.

    timestamp   activityID  w2  w3  w4
0   41.21   1.0     -1.34587    9.57245     2.83571
1   41.22   1.0     -1.76211    10.63590    2.59496
2   41.23   1.0     -2.45116    11.09340    2.23671
3   41.24   1.0     -2.42381    11.88590    1.77260
4   41.25   1.0     -2.31581    12.45170    1.50289

The problem is that you do the operation for each column using unnecessary loops. We could use DataFrame.agg with DataFrame.unstack and Series.set_axis to get correct names of columns.

Setup

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, (10, 100))).add_prefix('col')

new_serie = df.agg(['sum', 'mean', 
                    'var', 'std', 
                    'skew', 'kurt', 'median']).unstack()
new_df = pd.concat([df, new_serie.set_axis([f'{x}_{y}'
                                            for x, y in new_serie.index])
                                  .to_frame().T], axis=1)

# if new_df already exist:
#new_df.loc[0, :] = new_serie.set_axis([f'{x}_{y}' for x, y in new_serie.index])

   col0  col1  col2  col3  col4  col5  col6  col7  col8  col9  ...  \
0     8     7     6     7     6     5     8     7     8     4  ...   
1     8     1     8     7     0     8     8     4     6     1  ...   
2     5     6     3     5     4     9     3     0     2     5  ...   
3     3     3     3     3     5     4     5     1     3     5  ...   
4     7     9     4     5     6     7     0     3     4     6  ...   
5     0     5     2     0     8     0     3     7     6     5  ...   
6     7     0     1     4     8     9     4     9     2     9  ...   
7     0     6     1     0     6     1     3     0     3     4  ...   
8     3     6     1     8     3     0     7     6     8     6  ...   
9     2     5     8     5     8     4     9     1     9     9  ...   

   col98_skew  col98_kurt  col98_median  col99_sum  col99_mean  col99_var  \
0    0.456435   -0.939607           3.0       39.0         3.9   6.322222   
1         NaN         NaN           NaN        NaN         NaN        NaN   
2         NaN         NaN           NaN        NaN         NaN        NaN   
3         NaN         NaN           NaN        NaN         NaN        NaN   
4         NaN         NaN           NaN        NaN         NaN        NaN   
5         NaN         NaN           NaN        NaN         NaN        NaN   
6         NaN         NaN           NaN        NaN         NaN        NaN   
7         NaN         NaN           NaN        NaN         NaN        NaN   
8         NaN         NaN           NaN        NaN         NaN        NaN   
9         NaN         NaN           NaN        NaN         NaN        NaN   

   col99_std  col99_skew  col99_kurt  col99_median  
0   2.514403    0.402601    1.099343           4.0  
1        NaN         NaN         NaN           NaN  
2        NaN         NaN         NaN           NaN  
3        NaN         NaN         NaN           NaN  
4        NaN         NaN         NaN           NaN  
5        NaN         NaN         NaN           NaN  
6        NaN         NaN         NaN           NaN  
7        NaN         NaN         NaN           NaN  
8        NaN         NaN         NaN           NaN  
9        NaN         NaN         NaN           NaN 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM