简体   繁体   中英

Poor performance using .loc in Dataframe

I'm trying to combine certain columns by index of this dataframe, which I achived using a simple pd.cov() function, to calculate the variances and covariances of u_centro, v_centro and w_centro.

在此处输入图像描述

However, when I try to slice some of theses values using.loc, the performance is very slow (much slow.): For example:

df_uu = df.loc[(iz_centro,'u_centro'),'u_centro']

where I want all the combinations of u_centro by u_centro. The result is exactly what I wanted, but the time spend to complete this is abusurd, more than 10 minutes.

在此处输入图像描述

the whole data: https://raw.githubusercontent.com/AlessandroMDO/LargeEddySimulation/master/sd.csv

There are different ways to do this, but the best performance is using vectorization functions like xs (thanks @Paul H) or boolean masks for example:

 startime = datetime.now()

 mask = df.index.get_level_values(1) == 'u_centro'
 df.loc[mask]

 print(datetime.now() - startime) # 0:00:00.001417

I don't know if 1417 µs are a big deal in this case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM