Tensorflow：简单的3D Convnet不学习

Question

I am trying to create a simple 3D U-net for image segmentation , just to learn how to use the layers. 我正在尝试为图像分割创建一个简单的3D U-net ，只是为了学习如何使用这些图层。 Therefore I do a 3D convolution with stride 2 and then a transpose deconvolution to get back the same image size. 因此，我使用步幅2进行3D卷积，然后使用转置反卷积来获得相同的图像尺寸。 I am also overfitting to a small set (test set) just to see if my network is learning. 为了看看我的网络是否正在学习，我也对一套小套装（测试装置）过度拟合。

I created the same net in Keras and it works just fine. 我在Keras创建了相同的网络，它工作得很好。 Now I want to create in tensorflow but I been having trouble with it. 现在我想在tensorflow中创建，但我一直遇到麻烦。

The cost changes slightly but no matter what I do (reduce learning rate, add more epochs, add more layers, change batch size...) the output is always the same. 成本略有变化但无论我做什么（降低学习率，添加更多时期，添加更多图层，更改批量大小......）输出始终相同。 I believe the net is not updating the weights. 我相信网络不会更新权重。 I am sure I am doing something wrong but I can find what it is. 我确信我做错了什么但我能找到它是什么。 Any help would be greatly appreciate it. 任何帮助都会非常感激。

Here is my code : 这是我的代码 ：

def forward_propagation(X):

    if ( mode == 'train'): print(" --------- Net --------- ")

    # Convolutional Layer 1
    with tf.variable_scope('CONV1'):
        Z1 = tf.layers.conv3d(X, filters = 16, kernel =[3,3,3], strides = [ 2, 2, 2], padding='SAME', name = 'S2/conv3d')
        A1 = tf.nn.relu(Z1, name = 'S2/ReLU')
        if ( mode == 'train'): print("Convolutional Layer 1 S2 " + str(A1.get_shape()))

    # DEConvolutional Layer 1
    with tf.variable_scope('DeCONV1'):
        output_deconv1 = tf.stack([X.get_shape()[0] , X.get_shape()[1], X.get_shape()[2], X.get_shape()[3], 1])
        dZ1 = tf.nn.conv3d_transpose(A1,  filters = 1, kernel =[3,3,3], strides = [2, 2, 2], padding='SAME', name = 'S2/conv3d_transpose')
        dA1 = tf.nn.relu(dZ1, name = 'S2/ReLU')

        if ( mode == 'train'): print("Deconvolutional Layer 1 S1 " + str(dA1.get_shape()))

    return dA1


def compute_cost(output, target, method = 'dice_hard_coe'):

    with tf.variable_scope('COST'):       

        if (method == 'sigmoid_cross_entropy') :
            # Make them vectors
            output = tf.reshape( output, [-1, output.get_shape().as_list()[0]] )
            target = tf.reshape( target, [-1, target.get_shape().as_list()[0]] )
            loss = tf.nn.sigmoid_cross_entropy_with_logits(logits = output, labels = target)
            cost = tf.reduce_mean(loss)

    return cost

and the main function for the model : 以及该模型的主要功能：

def model(X_h5, Y_h5, learning_rate = 0.009,
          num_epochs = 100, minibatch_size = 64, print_cost = True):


    ops.reset_default_graph()                         # to be able to rerun the model without overwriting tf variables
    #tf.set_random_seed(1)                             # to keep results consistent (tensorflow seed)
    #seed = 3                                          # to keep results consistent (numpy seed)
    (m, n_D, n_H, n_W, num_channels) = X_h5["test_data"].shape   #TTT          
    num_labels = Y_h5["test_mask"].shape[4] #TTT
    img_size = Y_h5["test_mask"].shape[1]  #TTT
    costs = []                                        # To keep track of the cost
    accuracies = []                                   # To keep track of the accuracy



    # Create Placeholders of the correct shape
    X, Y = create_placeholders(n_H, n_W, n_D, minibatch_size)

    # Forward propagation: Build the forward propagation in the tensorflow graph
    nn_output = forward_propagation(X)
    prediction = tf.nn.sigmoid(nn_output)

    # Cost function: Add cost function to tensorflow graph
    cost_method = 'sigmoid_cross_entropy' 
    cost = compute_cost(nn_output, Y, cost_method)

    # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)

    # Initialize all the variables globally
    init = tf.global_variables_initializer()


    # Start the session to compute the tensorflow graph
    with tf.Session() as sess:

        print('------ Training ------')

        # Run the initialization
        tf.local_variables_initializer().run(session=sess)
        sess.run(init)

        # Do the training loop
        for i in range(num_epochs*m):
            # ----- TRAIN -------
            current_epoch = i//m            

            patient_start = i-(current_epoch * m)
            patient_end = patient_start + minibatch_size

            current_X_train = np.zeros((minibatch_size, n_D,  n_H, n_W,num_channels))
            current_X_train[:,:,:,:,:] = np.array(X_h5["test_data"][patient_start:patient_end,:,:,:,:]) #TTT
            current_X_train = np.nan_to_num(current_X_train) # make nan zero

            current_Y_train = np.zeros((minibatch_size, n_D, n_H, n_W, num_labels))
            current_Y_train[:,:,:,:,:] = np.array(Y_h5["test_mask"][patient_start:patient_end,:,:,:,:]) #TTT
            current_Y_train = np.nan_to_num(current_Y_train) # make nan zero

            feed_dict = {X: current_X_train, Y: current_Y_train}
            _ , temp_cost = sess.run([optimizer, cost], feed_dict=feed_dict)

            # ----- TEST -------
            # Print the cost every 1/5 epoch
            if ((i % (num_epochs*m/5) )== 0):              

                # Calculate the predictions
                test_predictions = np.zeros(Y_h5["test_mask"].shape)

                for j in range(0, X_h5["test_data"].shape[0], minibatch_size):

                    patient_start = j
                    patient_end = patient_start + minibatch_size

                    current_X_test = np.zeros((minibatch_size, n_D,  n_H, n_W, num_channels))
                    current_X_test[:,:,:,:,:] = np.array(X_h5["test_data"][patient_start:patient_end,:,:,:,:])
                    current_X_test = np.nan_to_num(current_X_test) # make nan zero

                    current_Y_test = np.zeros((minibatch_size, n_D, n_H, n_W, num_labels))
                    current_Y_test[:,:,:,:,:] = np.array(Y_h5["test_mask"][patient_start:patient_end,:,:,:,:]) 
                    current_Y_test = np.nan_to_num(current_Y_test) # make nan zero

                    feed_dict = {X: current_X_test, Y: current_Y_test}
                    _, current_prediction = sess.run([cost, prediction], feed_dict=feed_dict)
                    test_predictions[j:j + minibatch_size,:,:,:,:] = current_prediction

                costs.append(temp_cost)
                print ("[" + str(current_epoch) + "|" + str(num_epochs) + "] " + "Cost : " + str(costs[-1]))
                display_progress(X_h5["test_data"], Y_h5["test_mask"], test_predictions, 5, n_H, n_W)

        # plot the cost
        plt.plot(np.squeeze(costs))
        plt.ylabel('cost')
        plt.xlabel('epochs')
        plt.show()

        return

I call the model with: 我打电话给模型：

model(hdf5_data_file, hdf5_mask_file, num_epochs = 500, minibatch_size = 1, learning_rate = 1e-3)

These are the results that I am currently getting: 这些是我目前得到的结果：

Edit : I have tried reducing the learning rate and it doesn't help. 编辑：我已经尝试降低学习率，但没有帮助。 I also tried using tensorboard debug and the weights are not being updated: 我也尝试过使用tensorboard调试，权重没有更新：

I am not sure why this is happening. 我不确定为什么会这样。 I Created the same simple model in keras and it works fine. 我在keras中创建了相同的简单模型，它工作正常。 I am not sure what I am doing wrong in tensorflow. 我不确定我在tensorflow中做错了什么。

Answer 1

Not sure if you are still looking for help, as I am answering this question half a year later your posted date. 不确定你是否仍在寻求帮助，因为我在半年后你的发布日期回答了这个问题。 :) I've listed my observations and also some suggestions for you to try below. :)我已经列出了我的观察结果以及一些建议供您在下面尝试。 It my primary observation is right... then you probably just need a coffee break / a night of good sleep. 我的主要观察结果是正确的......那么你可能只需要一个喝咖啡休息时间/一晚睡眠。

primary observation: 初步观察：

tf.reshape( output, [-1, output.get_shape().as_list()[0]] ) seems wrong. tf.reshape( output, [-1, output.get_shape().as_list()[0]] )似乎错了。 If you prefer to flatten the vector, it should be something like tf.reshape(output,[-1,np.prod(image_shape_list)]) . 如果你喜欢展平矢量，它应该像tf.reshape(output,[-1,np.prod(image_shape_list)]) 。

other observations: 其他观察：

With such a shallow network, I doubt the network have enough spatial resolution to differentiate tumor voxels from non-tumor voxels. 有了这么浅的网络，我怀疑网络有足够的空间分辨率来区分肿瘤体素和非肿瘤体素。 Can you show the keras implementation and the performance compared to a pure tf implementation? 与纯tf实现相比，您能否展示keras实现和性能？ I would probably go with 2+ layers, let's . 我可能会选择2层以上，让我们。 say with 3 layers, with a stride of 2 per layer, and an input image width of 256, you will end with a width of 32 at your deepest encoder layer. 比如3层，每层步幅为2，输入图像宽度为256，最深的编码器层的宽度为32。 (If you have a limited GPU memory, downsample the input image.) （如果GPU内存有限，请对输入图像进行下采样。）
if changing the loss computation does not work, as @bremen_matt mentioned, reduce LR to say maybe 1e-5. 如果改变损失计算不起作用，正如@bremen_matt所提到的那样，将LR减少到1e-5。
after the basic architecture tweaks and you "feel" that the network is sort of learning and not stuck, try augmenting the training data, add dropout, batch norm during training, and then maybe fancy up your loss by adding a discriminator. 在基本架构调整之后，你“感觉”网络有点学习并且没有卡住，尝试增加训练数据，在训练期间添加辍学，批量规范，然后通过添加鉴别器来假装你的损失。

Tensorflow：简单的3D Convnet不学习

问题描述

1 个解决方案

解决方案1
0 2019-02-09 21:07:08

Tensorflow：简单的3D Convnet不学习

问题描述

1 个解决方案

解决方案1 0 2019-02-09 21:07:08

解决方案1
0 2019-02-09 21:07:08