简体   繁体   中英

Efficient symmetrical matrix data readout

I have a df filled with a symmetrical matrix of data. Now I want to extract data labels (positions of data) witch are above some value. In normal "for-loop" thinking, I'd do something like this:

for j in range(df.shape[0]): 
    for i in range(j):
        if (df.iloc[j, i] >= value):
            print(df.index.values[i]) #do something
            print(df.index.values[j])

I had this approach with two for loops, which was working really well. The problem with this was that it is way too slow. The data size can vary and have a few hundred or thousands of columns and rows. Therefore I would like to use a efficient way to process the data:

import pandas as pd
import numpy as np

from scipy.spatial.distance import squareform

value = 0.6

df = pd.DataFrame(squareform(np.random.rand(10)))

print(df)

result = []

for j in range(df.shape[0]):
    result = np.where(value <= df.loc[j, :]) 


    for element in range(len(result[0])):
        print(result[0][element]) # do something

This basically works. But my problem is now that I don't understand how to ignore the one symmetrical side of the matrix since I only need half of the matrix for further progressing.

numpy.where() does speed up the process, but it is giving me some confusion since it reads out one whole line of df and I don't know how to proceed proberly. I'd get double positions if I'd use result as it is which is unnecessary.

Is there maybe a thing I just don't see? Or maybe an even better way?

Thanks in advance

numpy ndenumerate

df = pd.DataFrame([
    [1, 2, 3, 4],
    [2, 3, 4, 5]
])

for (i, j), v in np.ndenumerate(df):
    do something

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM