Conv1D confusion in Tensorflow

Question

Trying to implement a paper and running into some brick-walls due to some dimensionality problems. My input is mono audio data where 128 frames of 50ms of 16kHz sampled audio is fed into the network. So my input shape is: [128,0.005*16000, 1] Here's the layer details -

1.) conv-bank block : Conv1d-bank-8, LeReLU, IN (instance normalization) I achieve this using :

bank_width = 8
conv_bank_outputs = tf.concat([ tf.layers.conv1d(input,1,k,activation=tf.nn.leaky_relu,padding="same") for k in range(1, bank_width + 1)], axes = -1)

2.) conv-block: C-512-5, LReLu --> C-512-5,stride=2, LReLu, IN, RES (Residual)

This is where I get stuck, the shapes of the output of second convolution and input to the (2) layer is mismatched. I can't get my head around it.

I achieve this using:

block_1 = tf.layers.conv1d(input,filters=512,kernel_size=5,activation=tf.nn.leaky_relu,padding="same")
block_2 = tf.layers.conv1d(block_1,filters=512,kernel_size=5,strides=2,activation=tf.nn.leaky_relu,padding="same")
IN = tf.contrib.layers.instance_norm(block_2)
RES = IN + input

Error: ValueError: Dimensions must be equal, but are 400 and 800 for 'add' (op: 'Add') with input shapes: [128,400,512], [128,800,1024].

Answer 1

When you run conv1d on block1 with stride = 2 , input data is halved as conv1d effectively samples only alternate numbers and also you have changed number of channels. This is usually worked around by downsampling input by 1x1 conv with stride 2 and filters 512, though I can be more specific if you can share the paper.

Conv1D confusion in Tensorflow

Question

1 answers

solution1
0 2018-07-04 20:34:26

Conv1D confusion in Tensorflow

Question

1 answers

solution1 0 2018-07-04 20:34:26

solution1
0 2018-07-04 20:34:26