[英]Get all component stats of multiple arrays labeled by one of them

I already asked a similar question which got answered but now this is more in detail: 我已经问过一个类似的问题,但得到了解答,但现在对此进行了更详细的介绍:

I need a really fast way to get all important component stats of two arrays, where one array is labeled by opencv2 and gives the component areas for both arrays. 我需要一种非常快速的方法来获取两个数组的所有重要组件状态,其中一个数组由opencv2标记,并提供两个数组的组件区域。 The stats for all components masked on the two arrays should then saved to a dictionary. 然后应将在两个阵列上屏蔽的所有组件的统计信息保存到字典中。 My approach works but it is much too slow. 我的方法有效,但是速度太慢。 Is there something to avoid the loop or a better approach then the ndimage.öabeled_comprehension? 有什么需要避免的循环或比ndimage.öabeled_comprehension更好的方法吗?

from scipy import ndimage
import numpy as np
import cv2

def calculateMeanMaxMin(val):
    return np.array([np.mean(val),np.max(val),np.min(val)])

def getTheStatsForComponents(array1,array2):
    ret, thresholded= cv2.threshold(array2, 120, 255, cv2.THRESH_BINARY)
    thresholded= thresholded.astype(np.uint8)
    numLabels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresholded, 8, cv2.CV_8UC1)
    meanmaxminArray2 = ndimage.labeled_comprehension(array2, labels, np.arange(1, numLabels+1), calculateMeanMaxMin, np.ndarray, 0)
    meanmaxminArray1 = ndimage.labeled_comprehension(array1, labels, np.arange(1, numLabels+1), calculateMeanMaxMin, np.ndarray, 0)
    for position, label in enumerate(range(1, numLabels)):
        currentLabel = np.uint8(labels== label)
        contour, _ = cv2.findContours(currentLabel, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)
        componentStat = stats[label]
        allstats = {'position':centroids[label,:],'area':componentStat[4],'height':componentStat[3],

        if side1 >= side2 and side1 > 0:
            allstats['elongation'] = np.float32(side2 / side1)
        elif side2 > side1 and side2 > 0:
            allstats['elongation'] = np.float32(side1 / side2)
            allstats['elongation'] = np.float32(0)
    return allComponentStats


The two arrays are 2d arrays: 这两个数组是2d数组:

array1= np.random.choice(255,(512,512)).astype(np.uint8)
array2= np.random.choice(255,(512,512)).astype(np.uint8)


small example of two arrays and the labelArray with two components(1 and 2, and background 0). 两个数组的小例子,带有两个组件(1和2,以及背景0)的labelArray。 Calculate the min,max mean with ndimage.labeled_comprhension. 用ndimage.labeled_comprhension计算最小值,最大值平均值。

from scipy import ndimage
import numpy as np

labelArray = np.array([[0,1,1,1],[2,2,1,1],[2,2,0,1]])
data = np.array([[0.1,0.2,0.99,0.2],[0.34,0.43,0.87,0.33],[0.22,0.53,0.1,0.456]])
data2 = np.array([[0.1,0.2,0.99,0.2],[0.1,0.2,0.99,0.2],[0.1,0.2,0.99,0.2]])
numLabels = 2

minimumDataForAllLabels = ndimage.labeled_comprehension(data, labelArray, np.arange(1, numLabels+1), np.min, np.ndarray, 0)
minimumData2ForallLabels = ndimage.labeled_comprehension(data2, labelArray, np.arange(1, numLabels+1), np.min, np.ndarray, 0)

Output: 输出:

[0.2 0.22] ##minimum of component 1 and 2 from data
[0.2 0.1] ##minimum of component 1 and 2 from data2
[0.1  0.2  0.22] ##minimum output of bin_and_do_simple_stats from data

labeled_comprehension is definitely slow . labeled_comprehension 肯定很慢

At least the simple stats can be done much faster based on the linked post. 至少,基于链接的帖子,简单的统计信息可以更快地完成。 For simplicity I'm only doing one data array, but as the procedure returns sort indices it can be easily extended to multiple arrays: 为简单起见,我只做一个数据数组,但是当过程返回排序索引时,它可以轻松扩展到多个数组:

import numpy as np    
from scipy import sparse
    from stb_pthr import sort_to_bins as _stb_pthr
    HAVE_PYTHRAN = False

# fallback if pythran not available

def sort_to_bins_sparse(idx, data, mx=-1):
    if mx==-1:
        mx = idx.max() + 1    
    aux = sparse.csr_matrix((data, idx, np.arange(len(idx)+1)), (len(idx), mx)).tocsc()
    return aux.data, aux.indices, aux.indptr

def sort_to_bins_pythran(idx, data, mx=-1):
    indices, indptr = _stb_pthr(idx, mx)
    return data[indices], indices, indptr

# pick best available

sort_to_bins = sort_to_bins_pythran if HAVE_PYTHRAN else sort_to_bins_sparse

# example data

idx = np.random.randint(0,10,(100000))
data = np.random.random(100000)

# if possible compare the two methods

    dsp,isp,psp = sort_to_bins_sparse(idx,data)
    dph,iph,pph = sort_to_bins_pythran(idx,data)

    assert (dsp==dph).all()
    assert (isp==iph).all()
    assert (psp==pph).all()

# example how to do simple vectorized calculations

def simple_stats(data,iptr):
    min = np.minimum.reduceat(data,iptr[:-1])
    mean = np.add.reduceat(data,iptr[:-1]) / np.diff(iptr)
    return min, mean

def bin_and_do_simple_stats(idx,data,mx=-1):
    data,indices,indptr = sort_to_bins(idx,data,mx)
    return simple_stats(data,indptr)

print("minima: {}\n mean values: {}".format(*bin_and_do_simple_stats(idx,data)))

If you have pythran (not required but a bit faster), compile this as <stb_pthr.py> : 如果您有pythran(不是必需的,但是要快一些), <stb_pthr.py>编译为<stb_pthr.py>

import numpy as np

#pythran export sort_to_bins(int[:], int)

def sort_to_bins(idx, mx):
    if mx==-1:
        mx = idx.max() + 1
    cnts = np.zeros(mx + 2, int)
    for i in range(idx.size):
        cnts[idx[i]+2] += 1
    for i in range(2, cnts.size):
        cnts[i] += cnts[i-1]
    res = np.empty_like(idx)
    for i in range(idx.size):
        res[cnts[idx[i]+1]] = i
        cnts[idx[i]+1] += 1
    return res, cnts[:-1]


