Efficient symmetrical matrix data readout

Question

I have a df filled with a symmetrical matrix of data. Now I want to extract data labels (positions of data) witch are above some value. In normal "for-loop" thinking, I'd do something like this:

for j in range(df.shape[0]): 
    for i in range(j):
        if (df.iloc[j, i] >= value):
            print(df.index.values[i]) #do something
            print(df.index.values[j])

I had this approach with two for loops, which was working really well. The problem with this was that it is way too slow. The data size can vary and have a few hundred or thousands of columns and rows. Therefore I would like to use a efficient way to process the data:

import pandas as pd
import numpy as np

from scipy.spatial.distance import squareform

value = 0.6

df = pd.DataFrame(squareform(np.random.rand(10)))

print(df)

result = []

for j in range(df.shape[0]):
    result = np.where(value <= df.loc[j, :]) 


    for element in range(len(result[0])):
        print(result[0][element]) # do something

This basically works. But my problem is now that I don't understand how to ignore the one symmetrical side of the matrix since I only need half of the matrix for further progressing.

numpy.where() does speed up the process, but it is giving me some confusion since it reads out one whole line of df and I don't know how to proceed proberly. I'd get double positions if I'd use result as it is which is unnecessary.

Is there maybe a thing I just don't see? Or maybe an even better way?

Thanks in advance

Answer 1

numpy ndenumerate

df = pd.DataFrame([
    [1, 2, 3, 4],
    [2, 3, 4, 5]
])

for (i, j), v in np.ndenumerate(df):
    do something

Efficient symmetrical matrix data readout

Question

1 answers

solution1
0 2020-10-19 09:24:29

Efficient symmetrical matrix data readout

Question

1 answers

solution1 0 2020-10-19 09:24:29

solution1
0 2020-10-19 09:24:29