简体   繁体   English

tf.nn.conv2d_transpose() 输入张量动态形状的双倍宽度和高度

[英]tf.nn.conv2d_transpose() double width and height for dynamic shapes of input tensor

When I tried to use tf.nn.conv2d_transpose() to get layer result which has doubled width, height and halved depth, it worked while using specified [batch, width, height, channel(input and output)].当我尝试使用tf.nn.conv2d_transpose()来获得宽度、高度和深度减半的图层结果时,它在使用指定的 [batch, width, height, channel(input and output)] 时起作用。

By setting batch_size="None", training works well for specified batch_size and validation works well for images(or an image).通过设置batch_size="None",训练适用于指定的 batch_size,验证适用于图像(或图像)。

Now, I'm trying to make encoder-decoder network structure using training [128 x 128 x 3] images.现在,我正在尝试使用训练[128 x 128 x 3]图像制作编码器-解码器网络结构。 (Those training images are cropped images from [wxhx 3] original images) (这些训练图像是从[wxhx 3]原始图像中裁剪的图像)

Input shape is [128 x 128 x 3] and output shape is [128 x 128 x 3] .输入形状是[128 x 128 x 3]和 output 形状是[128 x 128 x 3] First layer is convolution layer with k=3x3, strides = 1, padding = 1 with encoder-decoder structure.第一层是卷积层,k=3x3,strides = 1,padding = 1,编码器-解码器结构。

All of the process above works well for specified width and height (128 x 128).上述所有过程都适用于指定的宽度和高度 (128 x 128)。

However, after training is finished with training patches of [128 x 128 x 3] , I'd like to infer [wxhx 3] image using trained network.但是,在使用[128 x 128 x 3]的训练补丁完成训练后,我想使用经过训练的网络推断[wxhx 3]图像。

I guess all of the sequences: convolution, maxpool gives correct results, except transpose convolution.我猜所有的序列:卷积,maxpool 给出了正确的结果,除了转置卷积。

When I infer fixed shape of images [128 x 128 x 3] :当我推断图像的固定形状[128 x 128 x 3]时:

InputImagesTensor = tf.placeholder(tf.float32, [None, 128, 128, 3], name='InputImages')
ResultImages = libs.Network(InputImagesTensor)
saver = tf.train.Saver()
w = 128
h = 128

sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.restore(sess, 'output.ckpt')
for i in range(0, len(Datas.InputImageNameList)):
    temp = np.resize(getResizedImage(Datas.InputImageList[i]), (1, 128, 128, 3))
    resultimg = sess.run(ResultImages, feed_dict={InputImagesTensor: temp})

with network inside:内部有网络:

def Transpose2d(input, inC, outC):
    b, w, h, c = input.shape

    batch_size = tf.shape(input)[0]
    deconv_shape = tf.stack([batch_size, int(w*2), int(h*2), outC])
    kernel = tf.Variable(tf.random_normal([2, 2, outC, inC], stddev=0.01))
    output_shape = [None, int(w * 2), int(h * 2), outC]
    transConv = tf.nn.conv2d_transpose(input, kernel, output_shape=deconv_shape, strides=[1, 2, 2, 1], padding="SAME")

    return transConv

Now, I tried to convert them with fixed width and height to dynamic width and height.现在,我尝试将具有固定宽度和高度的它们转换为动态宽度和高度。 In my opinion, this would work (however it failed)在我看来,这会起作用(但它失败了)

Change改变

InputImagesTensor = tf.placeholder(tf.float32, [None, 128, 128, 3], name='InputImages')
temp = np.resize(getResizedImage(Datas.InputImageList[i]), (1, 128, 128, 3))

to

InputImagesTensor = tf.placeholder(tf.float32, [None, None, None, 3], name='InputImages')
temp = np.resize(getResizedImage(Datas.InputImageList[i]), (1, w, h, 3))

However, this line gives error.但是,这条线给出了错误。

deconv_shape = tf.stack([batch_size, int(w*2), int(h*2), outC])

TypeError: int returned non-int (type NoneType) TypeError: int返回非int(类型NoneType)

I guess it is because we cannot double the None value to 2*None .我想这是因为我们不能将None值加倍为2*None

How can I do this??我怎样才能做到这一点?? is it possible??可能吗??

Self answer...自答...

I cannot come up with suitable 'standard' solution to set w, h as None property for transpose convolution.我无法想出合适的“标准”解决方案来将 w、h 设置为转置卷积的 None 属性。

However I solved problem by giving transpose convolution's shape as maximum shape of my training/validation images.但是,我通过将转置卷积的形状作为训练/验证图像的最大形状来解决问题。 For example, if the maximum width and height of my images = [656 x 656], and a test image = [450 x 656], then I create zeros np.ndarray of [656 x 656] and just fill [450 x 656] region using test image RGB.例如,如果我的图像的最大宽度和高度 = [656 x 656],并且测试图像 = [450 x 656],那么我创建 [656 x 656] 的零 np.ndarray 并填充 [450 x 656 ] 区域使用测试图像 RGB。 (Apply concept of zero-padding) (应用零填充的概念)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM