使用 label 掩码有效地掩蔽图像

Question

I have an image that I read in with tifffile.imread and it is turned into a 3D matrix, with the first dimension representing the Y coordinate, the second the X and the third the channel of the image (these images are not RGB and so there can be an arbitrary number of channels).我有一张用tifffile.imread读入的图像，它变成了一个 3D 矩阵，第一维代表 Y 坐标，第二维代表 X，第三维代表图像的通道（这些图像不是 RGB 等可以有任意数量的通道）。

Each of these images has a label mask which is a 2D array that indicates the position of objects in the image.这些图像中的每一个都有一个 label 掩码，它是一个二维数组，指示图像中的 position 个对象。 In the label mask, pixels that have a value of 0 do not belong to any object, pixels that have a value of 1 belong to the first object, pixels that have a value of 2 belong to the second object and so on.在label掩码中，值为0的像素不属于任何object，值为1的像素属于第一个object，值为2的像素属于第二个object，依此类推。

What I would like to calculate is for each object and for each channel of the image I would like to know the mean, median, std, min and max of the channel.我想计算的是每个 object 以及图像的每个通道，我想知道通道的均值、中值、标准差、最小值和最大值。 So, for example, I would like to know the mean, mediam std, min and max values of the first channel for pixels in object 10.因此，例如，我想知道 object 10 中像素的第一个通道的平均值、中值标准差、最小值和最大值。

I have written code to do this but it is very slow (shown below) and I wondered if people had a better way or knew a package(s) that might be helpful i making this faster/doing this more efficiently.我已经编写了代码来执行此操作，但它非常慢（如下所示），我想知道人们是否有更好的方法或知道一个可能有助于我更快/更有效地执行此操作的程序包。 (Here the word 'stain' means the same as channel) （此处“污点”一词与通道的含义相同）

sample = imread(input_img)
label_mask = np.load(input_mask)

n_stains = sample.shape[2]
n_labels = np.max(label_mask)

#Create empty dataframe to store intensity measurements
intensity_measurements = pd.DataFrame(columns = ['sample', 'label', 'stain', 'mean', 'median', 'std', 'min', 'max'])

for label in range(1, n_labels+1):
    for stain in range(n_stains):
        #Extract stain and label
        stain_label = sample[:,:,stain][label_mask == label]

        #Calculate intensity measurements
        mean = np.mean(stain_label)
        median = np.median(stain_label)
        std = np.std(stain_label)
        min = np.min(stain_label)
        max = np.max(stain_label)

        #Add intensity measurements to dataframe
        intensity_measurements = intensity_measurements.append({'sample' : args.input_img, 'label': label, 'stain': stain, 'mean': mean, 'median': median, 'std': std, 'min': min, 'max': max}, ignore_index=True)

Answer 1

Your code is slow because you iterate over the whole image for each of the labels.您的代码很慢，因为您为每个标签遍历了整个图像。 This is an operation of O(nk), for n pixels and k labels.对于 n 个像素和 k 个标签，这是 O(nk) 的操作。 You could instead iterate over the image, and for each pixel examine the label, then update the measurements for that label with the pixel values.您可以改为遍历图像，并针对每个像素检查 label，然后使用像素值更新该 label 的测量值。 This is an operation of O(n).这是一个 O(n) 的操作。 You'd keep an accumulator for each label and each measurement (standard deviation requires accumulating the square sum as well as the sum, but the sum you're already accumulating for the mean).您将为每个 label 和每个测量保留一个累加器（标准偏差需要累加平方和以及总和，但您已经为均值累积的总和）。 The only measure that you cannot compute this way is the median, as it requires a partial sort of the full list of values.唯一不能以这种方式计算的度量是中位数，因为它需要对完整值列表进行部分排序。

This would obviously be a much cheaper operation, except for the fact that Python is a slow, interpreted language, and looping over each pixel in Python leads to a very slow program.这显然是一个更便宜的操作，除了 Python 是一种缓慢的解释性语言这一事实，并且循环遍历 Python 中的每个像素会导致一个非常慢的程序。 In a compiled language you would implement it this way though.在编译语言中，您将以这种方式实现它。

See this answer for a way to implement this efficiently using NumPy functionality.请参阅此答案，了解使用 NumPy 功能有效实现此目的的方法。

Using the DIPlib library (disclosure: I'm an author) you can apply the operation as follows (the median is not implemented).使用DIPlib库（披露：我是作者），您可以按如下方式应用操作（中位数未实现）。 Other image processing libraries have similar functionality, though might not be as flexible with the number of channels.其他图像处理库具有类似的功能，但在通道数量方面可能不那么灵活。

import diplib as dip

# sample = imread(input_img)
# label_mask = np.load(input_mask)
# Alternative random data so that I can run the code for testing:
sample = imageio.imread("../images/trui_c.tif")
label_mask = np.random.randint(0, 20, sample.shape[:2], dtype=np.uint32)

sample = dip.Image(sample, tensor_axis=2)
msr = dip.MeasurementTool.Measure(label_mask, sample, features=["Mean", "StandardDeviation", "MinVal", "MaxVal"])
print(msr)

This prints out:这打印出：

   |                                 Mean |                    StandardDeviation |                               MinVal |                               MaxVal |
-- | ------------------------------------ | ------------------------------------ | ------------------------------------ | ------------------------------------ |
   |      chan0 |      chan1 |      chan2 |      chan0 |      chan1 |      chan2 |      chan0 |      chan1 |      chan2 |      chan0 |      chan1 |      chan2 |
   |            |            |            |            |            |            |            |            |            |            |            |            |
-- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
 1 |      82.26 |      41.30 |      24.77 |      57.77 |      52.16 |      48.22 |      5.000 |      3.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 2 |      82.02 |      41.18 |      24.85 |      52.16 |      48.22 |      48.33 |      3.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 3 |      82.39 |      41.17 |      24.93 |      48.22 |      48.33 |      48.48 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 4 |      82.14 |      41.62 |      25.03 |      48.33 |      48.48 |      48.47 |      1.000 |      1.000 |      0.000 |      255.0 |      255.0 |      255.0 |
 5 |      82.89 |      41.45 |      24.94 |      48.48 |      48.47 |      48.54 |      1.000 |      0.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 6 |      82.83 |      41.60 |      25.26 |      48.47 |      48.54 |      48.65 |      0.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 7 |      81.95 |      41.77 |      25.51 |      48.54 |      48.65 |      48.22 |      1.000 |      1.000 |      2.000 |      255.0 |      255.0 |      255.0 |
 8 |      82.93 |      41.36 |      25.19 |      48.65 |      48.22 |      48.11 |      1.000 |      2.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 9 |      81.88 |      41.70 |      25.07 |      48.22 |      48.11 |      47.69 |      2.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
10 |      81.46 |      41.40 |      24.82 |      48.11 |      47.69 |      48.32 |      1.000 |      1.000 |      2.000 |      255.0 |      255.0 |      255.0 |
11 |      81.33 |      40.98 |      24.76 |      47.69 |      48.32 |      48.85 |      1.000 |      2.000 |      1.000 |      255.0 |      255.0 |      255.0 |
12 |      82.30 |      41.55 |      25.12 |      48.32 |      48.85 |      48.75 |      2.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
13 |      82.43 |      41.50 |      25.15 |      48.85 |      48.75 |      48.89 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
14 |      83.29 |      42.11 |      25.65 |      48.75 |      48.89 |      48.32 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
15 |      83.20 |      41.64 |      25.28 |      48.89 |      48.32 |      48.13 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
16 |      81.51 |      40.92 |      24.76 |      48.32 |      48.13 |      48.73 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
17 |      81.81 |      41.31 |      24.71 |      48.13 |      48.73 |      48.49 |      1.000 |      1.000 |      0.000 |      255.0 |      255.0 |      255.0 |
18 |      83.58 |      41.85 |      25.25 |      48.73 |      48.49 |      32.20 |      1.000 |      0.000 |      1.000 |      255.0 |      255.0 |      212.0 |
19 |      82.12 |      41.24 |      25.06 |      48.49 |      32.20 |      24.44 |      0.000 |      1.000 |      1.000 |      255.0 |      212.0 |      145.0 |

I don't have an efficient solution for the median.我没有中位数的有效解决方案。 You'd have to split the image into a separate array for each label, then run the median over that.您必须为每个 label 将图像拆分为一个单独的数组，然后在其上运行中值。 This would be equally efficient as the above, but use up much more memory.这与上面的方法一样有效，但会消耗更多 memory。

Answer 2

The proposed method below utilizes matrix multiplications in order to speed up the calculations.下面提出的方法利用矩阵乘法来加速计算。
It is built on two crucial Numpy tools:它建立在两个关键的 Numpy 工具之上：

https://numpy.org/doc/stable/reference/generated/numpy.einsum.html?highlight=einsum#numpy.einsum https://numpy.org/doc/stable/reference/generated/numpy.einsum.html?highlight=einsum#numpy.einsum

Evaluates the Einstein summation convention on the operands.评估操作数的爱因斯坦求和约定。

https://numpy.org/doc/stable/reference/maskedarray.html https://numpy.org/doc/stable/reference/maskedarray.html

Masked arrays are arrays that may have missing or invalid entries.屏蔽的 arrays 是 arrays，可能有缺失或无效的条目。 The numpy.ma module provides a nearly work-alike replacement for numpy that supports data arrays with masks. numpy.ma 模块为 numpy 提供了几乎类似的替代品，支持带掩码的数据 arrays。

masked array update: The initial code was updated with the masked array use after https://stackoverflow.com/users/7328782/cris-luengo spotted a mistake in my intial code.屏蔽数组更新：在https://stackoverflow.com/users/7328782/cris-luengo在我的初始代码中发现错误后，使用屏蔽数组更新了初始代码。

This replaces all the non-selected pixels for a given label with a 0 value, and includes all those zeros into the measurements.这会将给定 label 的所有未选择像素替换为 0 值，并将所有这些零包含在测量值中。

Now we mask the non-selected pixels before measurement calculations.现在我们在测量计算之前屏蔽未选择的像素。

import numpy as np
import numpy.ma as ma
import pandas as pd

sample = imread(input_img)
label_mask = np.load(input_mask)

n_labels = np.max(label_mask)

# let's create boolean label masks for each label 
# producing 3D matrix where 1st axis is label
label_mask_unraveled = np.equal.outer(label_mask, np.arange(1, n_labels +1))

# now we can apply these boolean label masks simultaniously
# to all the sample channels with help of 'einsum' producing 4D matrix, 
# where the 1st axis is channel/stain and the 2nd axis is label
sample_label_masks_applied = np.einsum("ijk,ijl->klij", sample, label_mask_unraveled)

# in order to exclude the non-selected pixels 
# from meausurement calculations, we mask the pixels first
non_selected_pixels_mask = np.moveaxis(~label_mask_unraveled, -1, 0)[np.newaxis, :, :, :]
non_selected_pixels_mask = np.repeat(non_selected_pixels_mask, sample.shape[2], axis=0)

sample_label_masks_applied = ma.masked_array(sample_label_masks_applied, non_selected_pixels_mask)    

# intensity measurement calculations
# embedded into pd.DataFrame initialization
intensity_measurements = pd.DataFrame(
    {
        "sample": args.input_img,
        "label": sample.shape[2] * list(range(1, n_labels+1)),
        "stain": n_labels * list(range(sample.shape[2])),
        "mean": ma.mean(sample_label_masks_applied, axis=(2, 3)).flatten(),
        "median": ma.median(sample_label_masks_applied, axis=(2, 3)).flatten(),
        "std": ma.std(sample_label_masks_applied, axis=(2, 3)).flatten(),
        "min": ma.min(sample_label_masks_applied, axis=(2, 3)).flatten(),
        "max": ma.max(sample_label_masks_applied, axis=(2, 3)).flatten() 
    }
)

Answer 3

I've found a good solution that works for me using scikit image, specifically the regionprops functions.我找到了一个很好的解决方案，可以使用 scikit 图像，特别是 regionprops 函数。

import numpy as np
import pandas as pd
from skimage.measure import regionprops, regionprops_table
np.random.seed(42)

Here is a random "image" and label mask of that image这是一个随机的“图像”和该图像的 label 掩码

img = np.random.randint(0, 255, size=(100, 100, 3))
mask = np.zeros((100, 100)).astype(np.uint8)
mask[20:50, 20:50] = 1
mask[65:70, 65:70] = 2

There is already an inbuilt function for measuring the mean intensity for each channel that is very fast已经有一个内置的 function 用于测量每个通道的平均强度，速度非常快

pd.DataFrame(regionprops_table(mask, img, properties=['label', 'mean_intensity']))

You can also pass custom functions that take a binary mask and one channel of an intensity image to regionprops_table您还可以将采用二进制掩码和强度图像的一个通道的自定义函数传递给regionprops_table

def my_mean_func(mask, img):
    return np.mean(img[mask])

pd.DataFrame(regionprops_table(mask, img, properties=['label'], extra_properties=[my_mean_func]))

This is fast because the binary mask and intensity image passed to the custom function is the minimum bounding box of the mask.这很快，因为传递给自定义 function 的二进制蒙版和强度图像是蒙版的最小边界框。 Therefore, the computations are much faster as they are operating over a much smaller area.因此，计算速度更快，因为它们在更小的区域上运行。

This only allows the user to calculate values per channel, but there is a generalisation that returns a 3D matrix of the selected region so that between channel measurements (or any measurements you like can be made).这只允许用户计算每个通道的值，但有一个概括会返回所选区域的 3D 矩阵，以便在通道测量之间（或可以进行任何您喜欢的测量）。

props = regionprops(mask, img)

for prop in props:
    print("Region ", prop['label'], ":")
    print("Mean intensity: ", prop['mean_intensity'])
    print()

This is only an example of the very basic functionality.这只是非常基本的功能的一个例子。

I haven't had time to benchmark any of the above algorithms, but the ones used in this answer are very very fast indeed and I use them to operate over very large images quite quickly.我没有时间对上述任何算法进行基准测试，但这个答案中使用的算法确实非常非常快，我用它们来快速处理非常大的图像。 However, it is important to note here that one of the reasons why this is so much faster for me is because I expect each object (each entry of the label mask that has the same value) to be only situated in a very small part of the image.但是，这里需要注意的是，这对我来说这么快的原因之一是因为我希望每个 object（具有相同值的 label 掩码的每个条目）仅位于非常小的一部分图片。 Therefore, the minimum bounding box representation returned by regionprops is much much smaller than the original image and drastically speeds up computation.因此， regionprops返回的最小边界框表示比原始图像小得多，并且大大加快了计算速度。

Thank you very much to everyone for their help.非常感谢大家的帮助。

使用 label 掩码有效地掩蔽图像

问题描述

3 个解决方案

解决方案1
1 2022-11-07 05:29:05

解决方案2
0 2022-11-07 03:41:46

解决方案3
0 已采纳 2022-11-23 11:53:36

使用 label 掩码有效地掩蔽图像

问题描述

3 个解决方案

解决方案1 1 2022-11-07 05:29:05

解决方案2 0 2022-11-07 03:41:46

解决方案3 0 已采纳 2022-11-23 11:53:36

解决方案1
1 2022-11-07 05:29:05

解决方案2
0 2022-11-07 03:41:46

解决方案3
0 已采纳 2022-11-23 11:53:36