Poor performance using .loc in Dataframe

Question

I'm trying to combine certain columns by index of this dataframe, which I achived using a simple pd.cov() function, to calculate the variances and covariances of u_centro, v_centro and w_centro.

However, when I try to slice some of theses values using.loc, the performance is very slow (much slow.): For example:

df_uu = df.loc[(iz_centro,'u_centro'),'u_centro']

where I want all the combinations of u_centro by u_centro. The result is exactly what I wanted, but the time spend to complete this is abusurd, more than 10 minutes.

the whole data: https://raw.githubusercontent.com/AlessandroMDO/LargeEddySimulation/master/sd.csv

Answer 1

There are different ways to do this, but the best performance is using vectorization functions like xs (thanks @Paul H) or boolean masks for example:

 startime = datetime.now()

 mask = df.index.get_level_values(1) == 'u_centro'
 df.loc[mask]

 print(datetime.now() - startime) # 0:00:00.001417

I don't know if 1417 µs are a big deal in this case.

Poor performance using .loc in Dataframe

Question

1 answers

solution1
2 ACCPTED 2020-04-22 18:31:26

Poor performance using .loc in Dataframe

Question

1 answers

solution1 2 ACCPTED 2020-04-22 18:31:26

solution1
2 ACCPTED 2020-04-22 18:31:26