具有不同尺寸图像的张量流卷积神经网络

Question

I am attempting to create a deep CNN that can classify each individual pixel in an image. 我正在尝试创建一个可以对图像中的每个像素进行分类的深度CNN。 I am replicating architecture from the image below taken from this paper. 我复制从图像架构下面取自这个文件。 In the paper it is mentioned that deconvolutions are used so that any size of input is possible. 在论文中提到使用去卷积使得任何大小的输入都是可能的。 This can be seen in the image below. 这可以在下图中看到。

Github Repository Github存储库

Currently, I have hard coded my model to accept images of size 32x32x7, but I would like to accept any size of input. 目前，我已经硬编码我的模型接受大小为32x32x7的图像，但我想接受任何大小的输入。 What changes would I need to make to my code to accept variable sized input? 我需要对我的代码进行哪些更改以接受可变大小的输入？

 x = tf.placeholder(tf.float32, shape=[None, 32*32*7])
 y_ = tf.placeholder(tf.float32, shape=[None, 32*32*7, 3])
 ...
 DeConnv1 = tf.nn.conv3d_transpose(layer1, filter = w, output_shape = [1,32,32,7,1], strides = [1,2,2,2,1], padding = 'SAME')
 ...
 final = tf.reshape(final, [1, 32*32*7])
 W_final = weight_variable([32*32*7,32*32*7,3])
 b_final = bias_variable([32*32*7,3])
 final_conv = tf.tensordot(final, W_final, axes=[[1], [1]]) + b_final

Answer 1

Dynamic placeholders 动态占位符

Tensorflow allows to have multiple dynamic (aka None ) dimensions in placeholders. Tensorflow允许在占位符中具有多个动态（即None ）维度。 The engine won't be able to ensure correctness while the graph is built, hence the client is responsible for feeding the correct input, but it provides a lot of flexibility. 在构建图形时，引擎将无法确保正确性，因此客户端负责提供正确的输入，但它提供了很大的灵活性。

So I'm going from... 所以我要去......

x = tf.placeholder(tf.float32, shape=[None, N*M*P])
y_ = tf.placeholder(tf.float32, shape=[None, N*M*P, 3])
...
x_image = tf.reshape(x, [-1, N, M, P, 1])

to... 至...

# Nearly all dimensions are dynamic
x_image = tf.placeholder(tf.float32, shape=[None, None, None, None, 1])
label = tf.placeholder(tf.float32, shape=[None, None, 3])

Since you intend to reshape the input to 5D anyway, so why don't use 5D in x_image right from the start. 既然您打算将输入重新x_image为5D，那么为什么不从一开始就在x_image使用5D。 At this point, the second dimension of label is arbitrary, but we promise tensorflow that it will match with x_image . 此时， label的第二维是任意的，但我们保证它将与x_image匹配的张量x_image 。

Dynamic shapes in deconvolution 反卷积中的动态形状

Next, the nice thing about tf.nn.conv3d_transpose is that its output shape can be dynamic. 接下来，关于tf.nn.conv3d_transpose是它的输出形状可以是动态的。 So instead of this: 所以不是这样的：

# Hard-coded output shape
DeConnv1 = tf.nn.conv3d_transpose(layer1, w, output_shape=[1,32,32,7,1], ...)

... you can do this: ... 你可以这样做：

# Dynamic output shape
DeConnv1 = tf.nn.conv3d_transpose(layer1, w, output_shape=tf.shape(x_image), ...)

This way the transpose convolution can be applied to any image and the result will take the shape of x_image that was actually passed in at runtime. 这样，转置卷积可以应用于任何图像，结果将采用在运行时实际传入的x_image的形状。

Note that static shape of x_image is (?, ?, ?, ?, 1) . 注意， x_image静态形状是(?, ?, ?, ?, 1) 。

All-Convolutional network 全卷积网络

Final and most important piece of the puzzle is to make the whole network convolutional, and that includes your final dense layer too. 这个难题的最后和最重要的部分是使整个网络卷积，并且包括你的最终密集层。 Dense layer must define its dimensions statically, which forces the whole neural network fix input image dimensions. 密集层必须静态定义其尺寸，这迫使整个神经网络修复输入图像尺寸。

Luckily for us, Springenberg at al describe a way to replace an FC layer with a CONV layer in "Striving for Simplicity: The All Convolutional Net" paper. 对我们来说幸运的是，Springenberg在“努力实现简单：全面卷积网”论文中描述了用CONV层取代FC层的方法。 I'm going to use a convolution with 3 1x1x1 filters (see also this question ): 我将使用带有3个1x1x1滤镜的卷积（另请参阅此问题）：

final_conv = conv3d_s1(final, weight_variable([1, 1, 1, 1, 3]))
y = tf.reshape(final_conv, [-1, 3])

If we ensure that final has the same dimensions as DeConnv1 (and others), it'll make y right the shape we want: [-1, N * M * P, 3] . 如果我们确保final具有相同尺寸DeConnv1 （及其他），它会让y正确的形状，我们希望： [-1, N * M * P, 3]

Combining it all together 将它们结合在一起

Your network is pretty large, but all deconvolutions basically follow the same pattern, so I've simplified my proof-of-concept code to just one deconvolution. 您的网络非常庞大，但所有解卷积基本上都遵循相同的模式，因此我将概念验证代码简化为一个解卷积。 The goal is just to show what kind of network is able to handle images of arbitrary size. 目标只是展示哪种网络能够处理任意大小的图像。 Final remark: image dimensions can vary between batches, but within one batch they have to be the same. 最后再说一句：图像尺寸可以批次之间有所不同，但一个批次内，他们必须是相同的。

The full code: 完整代码：

sess = tf.InteractiveSession()

def conv3d_dilation(tempX, tempFilter):
  return tf.layers.conv3d(tempX, filters=tempFilter, kernel_size=[3, 3, 1], strides=1, padding='SAME', dilation_rate=2)

def conv3d(tempX, tempW):
  return tf.nn.conv3d(tempX, tempW, strides=[1, 2, 2, 2, 1], padding='SAME')

def conv3d_s1(tempX, tempW):
  return tf.nn.conv3d(tempX, tempW, strides=[1, 1, 1, 1, 1], padding='SAME')

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def max_pool_3x3(x):
  return tf.nn.max_pool3d(x, ksize=[1, 3, 3, 3, 1], strides=[1, 2, 2, 2, 1], padding='SAME')

x_image = tf.placeholder(tf.float32, shape=[None, None, None, None, 1])
label = tf.placeholder(tf.float32, shape=[None, None, 3])

W_conv1 = weight_variable([3, 3, 1, 1, 32])
h_conv1 = conv3d(x_image, W_conv1)
# second convolution
W_conv2 = weight_variable([3, 3, 4, 32, 64])
h_conv2 = conv3d_s1(h_conv1, W_conv2)
# third convolution path 1
W_conv3_A = weight_variable([1, 1, 1, 64, 64])
h_conv3_A = conv3d_s1(h_conv2, W_conv3_A)
# third convolution path 2
W_conv3_B = weight_variable([1, 1, 1, 64, 64])
h_conv3_B = conv3d_s1(h_conv2, W_conv3_B)
# fourth convolution path 1
W_conv4_A = weight_variable([3, 3, 1, 64, 96])
h_conv4_A = conv3d_s1(h_conv3_A, W_conv4_A)
# fourth convolution path 2
W_conv4_B = weight_variable([1, 7, 1, 64, 64])
h_conv4_B = conv3d_s1(h_conv3_B, W_conv4_B)
# fifth convolution path 2
W_conv5_B = weight_variable([1, 7, 1, 64, 64])
h_conv5_B = conv3d_s1(h_conv4_B, W_conv5_B)
# sixth convolution path 2
W_conv6_B = weight_variable([3, 3, 1, 64, 96])
h_conv6_B = conv3d_s1(h_conv5_B, W_conv6_B)
# concatenation
layer1 = tf.concat([h_conv4_A, h_conv6_B], 4)
w = tf.Variable(tf.constant(1., shape=[2, 2, 4, 1, 192]))
DeConnv1 = tf.nn.conv3d_transpose(layer1, filter=w, output_shape=tf.shape(x_image), strides=[1, 2, 2, 2, 1], padding='SAME')

final = DeConnv1
final_conv = conv3d_s1(final, weight_variable([1, 1, 1, 1, 3]))
y = tf.reshape(final_conv, [-1, 3])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=label, logits=y))

print('x_image:', x_image)
print('DeConnv1:', DeConnv1)
print('final_conv:', final_conv)

def try_image(N, M, P, B=1):
  batch_x = np.random.normal(size=[B, N, M, P, 1])
  batch_y = np.ones([B, N * M * P, 3]) / 3.0

  deconv_val, final_conv_val, loss = sess.run([DeConnv1, final_conv, cross_entropy],
                                              feed_dict={x_image: batch_x, label: batch_y})
  print(deconv_val.shape)
  print(final_conv.shape)
  print(loss)
  print()

tf.global_variables_initializer().run()
try_image(32, 32, 7)
try_image(16, 16, 3)
try_image(16, 16, 3, 2)

Answer 2

Theoretically, it's possible. 从理论上讲，它是可能的。 you need to set the image size of the input and label image place holder to none , and let the graph dynamically infer the image size from input data. 您需要将输入和标签图像占位符的图像大小设置为none ，并让图形从输入数据动态推断图像大小。

However, have to be careful when you define the graph. 但是，定义图形时必须小心。 Need to use tf.shape instead of tf.get_shape() . 需要使用tf.shape而不是tf.get_shape() 。 the former dynamically infer the shape only when you session.run , the latter can get the shape when you define the graph. 前者仅在session.run时动态推断形状，后者可以在定义图形时获得形状。 But when input size is set to none , the latter does not get true reshape (maybe just return None). 但是当输入大小设置为none ，后者不会得到真正的重塑（可能只返回None）。

And to make things complicated, if you use tf.layers.conv2d or upconv2d , sometimes these high level functions do not like tf.shape , because it seems they assume the shape information are available during graph construction. 并且为了使事情变得复杂，如果你使用tf.layers.conv2d或upconv2d ，有时这些高级函数不喜欢tf.shape ，因为它们似乎假设形状信息在图形构造期间可用。

I hope I have better working example to show the points above. 我希望我有更好的工作实例来展示上述要点。 I'll put this answer as a placeholder and will come back and add more stuff if I get a chance. 我会把这个答案作为占位符，如果有机会我会回来添加更多东西。

具有不同尺寸图像的张量流卷积神经网络

问题描述

2 个解决方案

解决方案1
7 已采纳 2018-01-17 17:24:03

Dynamic placeholders 动态占位符

Dynamic shapes in deconvolution 反卷积中的动态形状

All-Convolutional network 全卷积网络

Combining it all together 将它们结合在一起

解决方案2
-1 2018-01-16 03:28:21

具有不同尺寸图像的张量流卷积神经网络

问题描述

2 个解决方案

解决方案1 7 已采纳 2018-01-17 17:24:03

Dynamic placeholders 动态占位符

Dynamic shapes in deconvolution 反卷积中的动态形状

All-Convolutional network 全卷积网络

Combining it all together 将它们结合在一起

解决方案2 -1 2018-01-16 03:28:21

解决方案1
7 已采纳 2018-01-17 17:24:03

解决方案2
-1 2018-01-16 03:28:21