简体   繁体   中英

tf.nn.conv2d_transpose() double width and height for dynamic shapes of input tensor

When I tried to use tf.nn.conv2d_transpose() to get layer result which has doubled width, height and halved depth, it worked while using specified [batch, width, height, channel(input and output)].

By setting batch_size="None", training works well for specified batch_size and validation works well for images(or an image).

Now, I'm trying to make encoder-decoder network structure using training [128 x 128 x 3] images. (Those training images are cropped images from [wxhx 3] original images)

Input shape is [128 x 128 x 3] and output shape is [128 x 128 x 3] . First layer is convolution layer with k=3x3, strides = 1, padding = 1 with encoder-decoder structure.

All of the process above works well for specified width and height (128 x 128).

However, after training is finished with training patches of [128 x 128 x 3] , I'd like to infer [wxhx 3] image using trained network.

I guess all of the sequences: convolution, maxpool gives correct results, except transpose convolution.

When I infer fixed shape of images [128 x 128 x 3] :

InputImagesTensor = tf.placeholder(tf.float32, [None, 128, 128, 3], name='InputImages')
ResultImages = libs.Network(InputImagesTensor)
saver = tf.train.Saver()
w = 128
h = 128

sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver.restore(sess, 'output.ckpt')
for i in range(0, len(Datas.InputImageNameList)):
    temp = np.resize(getResizedImage(Datas.InputImageList[i]), (1, 128, 128, 3))
    resultimg = sess.run(ResultImages, feed_dict={InputImagesTensor: temp})

with network inside:

def Transpose2d(input, inC, outC):
    b, w, h, c = input.shape

    batch_size = tf.shape(input)[0]
    deconv_shape = tf.stack([batch_size, int(w*2), int(h*2), outC])
    kernel = tf.Variable(tf.random_normal([2, 2, outC, inC], stddev=0.01))
    output_shape = [None, int(w * 2), int(h * 2), outC]
    transConv = tf.nn.conv2d_transpose(input, kernel, output_shape=deconv_shape, strides=[1, 2, 2, 1], padding="SAME")

    return transConv

Now, I tried to convert them with fixed width and height to dynamic width and height. In my opinion, this would work (however it failed)

Change

InputImagesTensor = tf.placeholder(tf.float32, [None, 128, 128, 3], name='InputImages')
temp = np.resize(getResizedImage(Datas.InputImageList[i]), (1, 128, 128, 3))

to

InputImagesTensor = tf.placeholder(tf.float32, [None, None, None, 3], name='InputImages')
temp = np.resize(getResizedImage(Datas.InputImageList[i]), (1, w, h, 3))

However, this line gives error.

deconv_shape = tf.stack([batch_size, int(w*2), int(h*2), outC])

TypeError: int returned non-int (type NoneType)

I guess it is because we cannot double the None value to 2*None .

How can I do this?? is it possible??

Self answer...

I cannot come up with suitable 'standard' solution to set w, h as None property for transpose convolution.

However I solved problem by giving transpose convolution's shape as maximum shape of my training/validation images. For example, if the maximum width and height of my images = [656 x 656], and a test image = [450 x 656], then I create zeros np.ndarray of [656 x 656] and just fill [450 x 656] region using test image RGB. (Apply concept of zero-padding)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM