Tensorflow U-Net 分割掩码输入

Question

I am new to tensorflow and Semantic segmentation.我是张量流和语义分割的新手。

I am designing a U-Net for semantic segmentaion.我正在设计一个用于语义分割的 U-Net。 Each image has one object that I want to classify.每张图像都有一个我想要分类的对象。 But in total I have images of 10 different objects.但我总共有 10 个不同物体的图像。 I am confused, how can I prepare my mask input?我很困惑，如何准备我的掩码输入？ Is it considered as multi-label segmentation or only for one class?它被视为多标签分割还是仅用于一类？

Should I convert my input to one hot encoded?我应该将我的输入转换为一种热编码吗？ Should I use to_categorical?我应该使用 to_categorical 吗？ I find exaples for multi-class segmentation, but I don't know, If that's the case here.我找到了多类分割的例子，但我不知道，如果是这种情况。 Because in one image I only have one object to detect/classify.因为在一张图像中，我只有一个对象要检测/分类。

I tried using this as my code for input.我尝试使用它作为我的输入代码。 But I am not sure, what I am doing is right or not.但我不确定，我所做的是否正确。

#Generation of batches of image and mask
class DataGen(keras.utils.Sequence):
    def __init__(self, image_names, path, batch_size, image_size=128):
        self.image_names = image_names
        self.path = path
        self.batch_size = batch_size
        self.image_size = image_size

    def __load__(self, image_name):
        # Path
        image_path = os.path.join(self.path, "images/aug_test", image_name) + ".png"
        mask_path = os.path.join(self.path, "masks/aug_test",image_name) +  ".png"

        # Reading Image
        image = cv2.imread(image_path, 1)
        image = cv2.resize(image, (self.image_size, self.image_size))


        # Reading Mask
        mask = cv2.imread(mask_path, -1)
        mask = cv2.resize(mask, (self.image_size, self.image_size))

        ## Normalizaing 
        image = image/255.0
        mask = mask/255.0

        return image, mask

    def __getitem__(self, index):
        if(index+1)*self.batch_size > len(self.image_names):
            self.batch_size = len(self.image_names) - index*self.batch_size

        image_batch = self.image_names[index*self.batch_size : (index+1)*self.batch_size]

        image = []
        mask  = []

        for image_name in image_batch:
            _img, _mask = self.__load__(image_name)
            image.append(_img)
            mask.append(_mask)

        #This is where I am defining my input
        image = np.array(image)
        mask  = np.array(mask)
        mask = tf.keras.utils.to_categorical(mask, num_classes=10, dtype='float32') #Is this true?


        return image, mask

    def __len__(self):
        return int(np.ceil(len(self.image_names)/float(self.batch_size)))

Is this true?这是真的？ If it is, then, to get the label/class as output what should I change in my input?如果是这样，那么为了将标签/类作为输出，我应该在输入中更改什么？ Should I change the value of pixel of my mask according to my class?我应该根据我的班级更改我的蒙版像素值吗？

Here is my U-Net architecture.这是我的 U-Net 架构。

# Convolution and deconvolution Blocks

def down_scaling_block(x, filters, kernel_size=(3, 3), padding="same", strides=1):
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
    pool = keras.layers.MaxPool2D((2, 2), (2, 2))(conv)
    return conv, pool

def up_scaling_block(x, skip, filters, kernel_size=(3, 3), padding="same", strides=1):
    conv_t = keras.layers.UpSampling2D((2, 2))(x)
    concat = keras.layers.Concatenate()([conv_t, skip])
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(concat)
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
    return conv

def bottleneck(x, filters, kernel_size=(3, 3), padding="same", strides=1):
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
    return conv

def UNet():
    filters = [16, 32, 64, 128, 256]
    inputs = keras.layers.Input((image_size, image_size, 3))

    '''inputs2 = keras.layers.Input((image_size, image_size, 1))
       conv1_2, pool1_2 = down_scaling_block(inputs2, filters[0])'''

    Input = inputs
    conv1, pool1 = down_scaling_block(Input, filters[0])
    conv2, pool2 = down_scaling_block(pool1, filters[1])
    conv3, pool3 = down_scaling_block(pool2, filters[2])
    '''conv3 = keras.layers.Conv2D(filters[2], kernel_size=(3,3), padding="same", strides=1, activation="relu")(pool2)
    conv3 = keras.layers.Conv2D(filters[2], kernel_size=(3,3), padding="same", strides=1, activation="relu")(conv3)
    drop3 = keras.layers.Dropout(0.5)(conv3)
    pool3 = keras.layers.MaxPooling2D((2,2), (2,2))(drop3)'''
    conv4, pool4 = down_scaling_block(pool3, filters[3])

    bn = bottleneck(pool4, filters[4])

    deConv1 = up_scaling_block(bn, conv4, filters[3]) #8 -> 16
    deConv2 = up_scaling_block(deConv1, conv3, filters[2]) #16 -> 32
    deConv3 = up_scaling_block(deConv2, conv2, filters[1]) #32 -> 64
    deConv4 = up_scaling_block(deConv3, conv1, filters[0]) #64 -> 128

    outputs = keras.layers.Conv2D(10, (1, 1), padding="same", activation="softmax")(deConv4)
    model = keras.models.Model(inputs, outputs)
    return model

model = UNet()
model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["acc"])

train_gen = DataGen(train_img, train_path, image_size=image_size, batch_size=batch_size)
valid_gen = DataGen(valid_img, train_path, image_size=image_size, batch_size=batch_size)
test_gen = DataGen(test_img, test_path, image_size=image_size, batch_size=batch_size)

train_steps = len(train_img)//batch_size
valid_steps = len(valid_img)//batch_size

model.fit_generator(train_gen, validation_data=valid_gen, steps_per_epoch=train_steps, validation_steps=valid_steps, 
                    epochs=epochs)

I hope that I explained my question properly.我希望我正确解释了我的问题。 Any help appriciated!任何帮助appriciated！

UPDATE: I changed the value of each pixel in mask as per object class.更新：我根据对象类更改了掩码中每个像素的值。 (If the image contains object which I want to classify as object no. 2, then I changed the value of mask pixel to 2. the whole array of mask will contain 0(bg) and 2(object). Accordingly for each object, the mask will contain 0 and 3, 0 and 10 etc.) （如果图像包含我想归类为 2 号对象的对象，那么我将掩码像素的值更改为 2。掩码的整个数组将包含 0（bg）和 2（对象）。因此，对于每个对象，掩码将包含 0 和 3、0 和 10 等）

Here I first changed the mask to binary and then if the value of pixel is greater than 1, I changed it to 1 or 2 or 3. (according to object/class no.)这里我先把mask改成binary，然后如果pixel的值大于1，就改成1或2或3。（根据object/class no.）

Then I converted them to one_hot with to_categorical as shown in my code.然后我使用 to_categorical 将它们转换为 one_hot，如我的代码所示。 training runs but the network doesnt learn anything.训练运行，但网络没有学到任何东西。 Accuracy and loss keep swinging between two values.准确度和损失在两个值之间不断摆动。 What is my mistake here?我在这里有什么错误？ Am I making a mistake at generating mask (changing the value of pixels?) Or at the function to_categorical?我在生成掩码（更改像素值？）或函数 to_categorical 时犯了错误？

PROBLEM FOUND: I was making an error while creating mask.. I was reading image with cv2, which reads image as heightxwidth.. I was creating mask with pixel values according to class, after considering my image dimention as widthxheight.. Which was causing problem and making network not to learn anything.. It is working now..发现的问题：我在创建蒙版时犯了一个错误..我正在用 cv2 读取图像，它将图像读取为 heightxwidth.. 我正在根据类创建具有像素值的蒙版，在考虑我的图像尺寸为 widthxheight 之后..这是导致问题并使网络不学习任何东西..它现在正在工作..

Answer 1

Each image has one object that I want to classify.每张图像都有一个我想要分类的对象。 But in total I have images of 10 different objects.但我总共有 10 个不同物体的图像。 I am confused, how can I prepare my mask input?我很困惑，如何准备我的掩码输入？ Is it considered as multi-label segmentation or only for one class?它被视为多标签分割还是仅用于一类？

If your dataset has N different labels (ie: 0 - background, 1 - dogs, 2 -cats...), you have a multi class problem, even if your images contain only kind of object.如果您的数据集有 N 个不同的标签（即：0 - 背景、1 - 狗、2 - 猫...），即使您的图像仅包含一种对象，您也会遇到多类问题。

Should I convert my input to one hot encoded?我应该将我的输入转换为一种热编码吗？ Should I use to_categorical?我应该使用 to_categorical 吗？

Yes, you should one-hot encode your labels.是的，您应该对标签进行一次性编码。 Using to_categorical boils down to the source format of your labels.使用 to_categorical 归结为标签的源格式。 Say you have N classes and your labels are (height, width, 1), where each pixel has a value in range [0,N).假设你有 N 个类，你的标签是 (height, width, 1)，其中每个像素都有一个范围 [0,N) 的值。 In that case keras.utils.to_categorical(label, N) will provide a float (height,width,N) label, where each pixel is 0 or 1. And you don't have to divide by 255.在这种情况下keras.utils.to_categorical(label, N)将提供一个浮点数 (height,width,N) 标签，其中每个像素为 0 或 1。而且您不必除以 255。

if your source format is different, you may have to use a custom function to get the same output format.如果您的源格式不同，您可能必须使用自定义函数来获得相同的输出格式。

Check out this repo (not my work): keras-unet .查看这个 repo（不是我的作品）： keras-unet 。 The notebooks folder contain two examples to train a u-net on small datasets. notebooks 文件夹包含两个示例，用于在小型数据集上训练 u-net。 They are not multiclass, but it is easy to go step by step to use your own dataset.它们不是多类的，但是很容易一步一步地使用你自己的数据集。 Star by loading your labels as:通过将您的标签加载为：

im = Image.open(mask).resize((512,512))
im = to_categorical(im,NCLASSES)

reshape and normalize like this:像这样重塑和规范化：

x = np.asarray(imgs_np, dtype=np.float32)/255
y = np.asarray(masks_np, dtype=np.float32)
y = y.reshape(y.shape[0], y.shape[1], y.shape[2], NCLASSES)
x = x.reshape(x.shape[0], x.shape[1], x.shape[2], 3)

adapt your model to NCLASSES使您的模型适应 NCLASSES

model = custom_unet(
input_shape,
use_batch_norm=False,
num_classes=NCLASSES,
filters=64,
dropout=0.2,
output_activation='softmax')

select the correct loss:选择正确的损失：

from keras.losses import categorical_crossentropy
model.compile(    
   optimizer=SGD(lr=0.01, momentum=0.99),
   loss='categorical_crossentropy',    
   metrics=[iou, iou_thresholded])

Hope it helps希望能帮助到你

Tensorflow U-Net 分割掩码输入

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-04 10:56:55

Tensorflow U-Net 分割掩码输入

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-04 10:56:55

解决方案1
1 已采纳 2020-03-04 10:56:55