Keras U-Net 多标签分割，带有两个输入二进制掩码

Question

我正在使用带有 Keras 后端的 U-Net 解决多标签分割问题。 对于每个输入图像，我有两个掩码，属于两个不同的对象。 图像和蒙版的大小为 224 x 224，分别为 RGB 和灰度。 文件夹结构如下：

data
 |_train
     |_image 
     |_label1 (binary masks of object 1)
     |_label2 (binary masks of object 2)

我正在使用带有 vgg-16 主干的 Qubvel 分割模型https://github.com/qubvel/segmentation_models 。 下面显示的是我的训练管道：

img_width, img_height = 224,224
input_shape = (img_width, img_height, 3)
model_input = Input(shape=input_shape)
n_classes=2 # masks of object 1 and object 2 
activation='sigmoid' #since I want multi-label output and not multi-class
batch_size = 16
n_epochs = 128

BACKBONE = 'vgg16'
model1 = sm.Unet(BACKBONE, 
                 encoder_weights='imagenet', 
                 classes=n_classes, 
                 activation=activation)
opt = keras.optimizers.Adam(lr=0.001) 
loss_func='binary_crossentropy'
model1.compile(optimizer=opt, 
              loss=loss_func, 
              metrics=['binary_accuracy'])

callbacks = [ModelCheckpoint(monitor='val_loss', 
                             filepath='model1.hdf5', 
                             save_best_only=True, 
                             save_weights_only=True, 
                             mode='min', 
                             verbose = 1)]
history1 = model1.fit(X_tr, Y_tr, 
                    batch_size=batch_size, 
                    epochs=n_epochs, 
                    callbacks=callbacks,
                    validation_data=(X_val, Y_val))

model各层的形状如下：

[(None, None, None, 3)]
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 512)
(None, None, None, 1024)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 256)
(None, None, None, 768)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 128)
(None, None, None, 384)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 64)
(None, None, None, 192)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 32)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 16)
(None, None, None, 2)
(None, None, None, 2)

下面显示的是我的数据准备管道，每个图像有两个掩码。 我正在尝试为每个输入图像堆叠掩码 1 和掩码 2：

ids = next(os.walk("data/train/image"))[2] 
print("No. of images = ", len(ids))
X = np.zeros((len(ids), im_height, im_width, 3), dtype=np.float32) #RGB input
Y = np.zeros((len(ids), im_height, im_width, 1), dtype=np.float32) #grayscale input for the masks
for n, id_ in tqdm(enumerate(ids), total=len(ids)):
    img = load_img("data/train/image/"+id_, color_mode = "rgb")
    x_img = img_to_array(img)
    x_img = resize(x_img, (224,224,3), 
                   mode = 'constant', preserve_range = True)
    # Load mask
    mask1 = img_to_array(load_img("data/train/label1/"+id_, color_mode = "grayscale"))
    mask2 = img_to_array(load_img("data/train/label2/"+id_, color_mode = "grayscale"))
    mask1 = resize(mask1, (224,224,1), 
                  mode = 'constant', preserve_range = True)
    mask2 = resize(mask2, (224,224,1), 
                  mode = 'constant', preserve_range = True)
    mask = np.stack([mask1,mask2], axis=-1)
    # Save images
    X[n] = x_img/255.0
    Y[n] = mask/255.0

X_tr, X_val, Y_tr, Y_val = train_test_split(X, Y, test_size=0.3, random_state=42)

我收到以下错误：

Traceback (most recent call last):

  File "/home/codes/untitled1.py", line 482, in <module>
    Y[n] = mask/255.0

ValueError: could not broadcast input array from shape (224,224,1,2) into shape (224,224,1)

我应该使用什么正确的语法并修改代码来堆叠掩码并训练多标签 model？ 感谢并期待代码中的更正。

Answer 1

您需要更新Y的定义，因为它包含两个掩码，并且形状应与 model 的 output 匹配：

Y = np.zeros((len(ids), im_height, im_width, 2), dtype=np.float32)

然后重塑面具：

mask = np.stack([mask1,mask2], axis=-1)
# Save images
X[n] = x_img/255.0
Y[n] = np.reshape(mask/255.0, (224,224,2))

（我不确定，但您可以直接堆叠到 Y[n] 中，而不是上面的那个：

np.stack([mask1,mask2], axis=-1, out=Y[n])
# Save images
X[n] = x_img/255.0
Y[n] = Y[n] / 255.0

在这种情况下不需要重塑）

Keras U-Net 多标签分割，带有两个输入二进制掩码

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-09-23 12:58:57

Keras U-Net 多标签分割，带有两个输入二进制掩码

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-09-23 12:58:57

解决方案1
1 已采纳 2022-09-23 12:58:57