简体   繁体   中英

Capsule networks for binary classification not training

Currently I am trying to implement a capsule.network using Xifeng Guo's Keras code for capsule.nets . I have a dataset of brain tumor images with 98 negatively labeled instances and 155 positively labeled instances. I would like to use the cap.net to predict either positive or negative for a brain tumor on the image. Unfortunately I cannot figure out why it is not going beyond a set accuracy / loss. I have attempted data augmentation to increase the dataset size, with a 50/50 prediction as a result.

I have read the paper on ' Capsule Networks against Medical Imaging Data Challenges ', where they did a capsule.net implementation on, amongst others, the DIARETDB1 dataset , which comprises of only 89 images, and it gets decent predictions, even without data augmentation (0.887 F1 score on imbalanced scenario 1). This makes me believe maybe something is going wrong in the.network. FYI: My images are normalized and cropped.

Any input is appreciated!

%pylab inline
import os
import numpy as np

import tensorflow as tf
import keras
import keras.backend as K

from capsulelayers import CapsuleLayer, PrimaryCap, Length, Mask
from keras import layers, models, optimizers
from keras.applications import vgg16
from keras.layers import Conv2D, MaxPooling2D

K.set_image_data_format('channels_last')


def CapsNet(input_shape, n_class, routings):
   x = layers.Input(shape=input_shape)

   # Layer 1: Just a conventional Conv2D layer
   conv1 = Conv2D(filters=256, kernel_size=9, strides=1, padding='valid', activation='relu', name='conv1')(x)

   # Layer 2: Conv2D layer with `squash` activation, then reshape to [None, num_capsule, dim_capsule]
   primarycaps = PrimaryCap(conv1, dim_capsule=8, n_channels=32, kernel_size=9, strides=2, padding='valid')

   # Layer 3: Capsule layer. Routing algorithm works here.
   digitcaps = CapsuleLayer(num_capsule=n_class, dim_capsule=16, routings=routings,
   name='digitcaps')(primarycaps)

   # Layer 4: This is an auxiliary layer to replace each capsule with its length. Just to match the true label's shape.
   # If using tensorflow, this will not be necessary. :)
   out_caps = Length(name='capsnet')(digitcaps) # CAN WE EXCLUDE THIS IN KERAS TOO?

   # Decoder network.
   y = layers.Input(shape=(n_class,))
   masked_by_y = Mask()([digitcaps, y]) # The true label is used to mask the output of capsule layer. For training
   masked = Mask()(digitcaps) # Mask using the capsule with maximal length. For prediction

   # Shared Decoder model in training and prediction
   decoder = models.Sequential(name='decoder')
   decoder.add(layers.Dense(512, activation='relu', input_dim=16*n_class))
   decoder.add(layers.Dense(1024, activation='relu'))
   decoder.add(layers.Dense(np.prod(input_shape), activation='sigmoid'))
   decoder.add(layers.Reshape(target_shape=input_shape, name='out_recon'))

   # Models for training and evaluation (prediction)
   train_model = models.Model([x, y], [out_caps, decoder(masked_by_y)])
   eval_model = models.Model(x, [out_caps, decoder(masked)])

   # manipulate model
   noise = layers.Input(shape=(n_class, 16))
   noised_digitcaps = layers.Add()([digitcaps, noise])
   masked_noised_y = Mask()([noised_digitcaps, y])
   manipulate_model = models.Model([x, y, noise], decoder(masked_noised_y))

   return train_model, eval_model, manipulate_model



def margin_loss(y_true, y_pred):
    """
    Margin loss for Eq.(4). When y_true[i, :] contains not just one `1`, this loss should work too. Not test it.
    :param y_true: [None, n_classes]
    :param y_pred: [None, num_capsule]
    :return: a scalar loss value.
    """
    L = y_true * K.square(K.maximum(0., 0.9 - y_pred)) + \
        0.5 * (1 - y_true) * K.square(K.maximum(0., y_pred - 0.1))

    return K.mean(K.sum(L, 1))
model, eval_model, manipulate_model = CapsNet(input_shape=x_train.shape[1:],
 n_class=1,
 routings=2)
# compile the model
model.compile(optimizer=optimizers.Adam(lr=3e-3),
 loss=[margin_loss, 'mse'],
 metrics={'capsnet': 'accuracy'})

model.summary()
history = model.fit(
        [x_train, y_train],[y_train,x_train],
        batch_size=16,
        epochs=30,
        validation_data=([x_val, y_val], [y_val, x_val]),
        shuffle=True)

The result is plenty of epochs where neither the accuracy nor the loss really changes:

Epoch 1/30
161/161 [==============================] - 12s 77ms/step - loss: 0.2700 - capsnet_loss: 0.1911 - decoder_loss: 0.0789 - capsnet_acc: 0.5901 - val_loss: 0.2153 - val_capsnet_loss: 0.1588 - val_decoder_loss: 0.0565 - val_capsnet_acc: 0.6078
Epoch 2/30
161/161 [==============================] - 9s 56ms/step - loss: 0.2046 - capsnet_loss: 0.1560 - decoder_loss: 0.0486 - capsnet_acc: 0.6149 - val_loss: 0.2015 - val_capsnet_loss: 0.1588 - val_decoder_loss: 0.0427 - val_capsnet_acc: 0.6078
Epoch 3/30
161/161 [==============================] - 9s 56ms/step - loss: 0.1960 - capsnet_loss: 0.1560 - decoder_loss: 0.0401 - capsnet_acc: 0.6149 - val_loss: 0.1982 - val_capsnet_loss: 0.1588 - val_decoder_loss: 0.0394 - val_capsnet_acc: 0.6078

There exist two vector transformation procedure to obtain capsules from convolutions namely, Matrix vector transformation and convolutional vector transformation. Since you are having small amount of data, it is better to use convolutional vector transformation which is better in this case.

I advise you to introduce a batch normalization layer under the first convolutional layer and see what it gives.

I had the same problem with training the capsule.network on some datasets in which the training process did not converge. I accidentally reduced the Adam learning rate default parameter from 0.001 to 0.000001 and the problem was solved.

So, I think this parameter plays an important role here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM