简体   繁体   中英

Conv1D confusion in Tensorflow

Trying to implement a paper and running into some brick-walls due to some dimensionality problems. My input is mono audio data where 128 frames of 50ms of 16kHz sampled audio is fed into the network. So my input shape is: [128,0.005*16000, 1] Here's the layer details -

1.) conv-bank block : Conv1d-bank-8, LeReLU, IN (instance normalization) I achieve this using :

bank_width = 8
conv_bank_outputs = tf.concat([ tf.layers.conv1d(input,1,k,activation=tf.nn.leaky_relu,padding="same") for k in range(1, bank_width + 1)], axes = -1)

2.) conv-block: C-512-5, LReLu --> C-512-5,stride=2, LReLu, IN, RES (Residual)

This is where I get stuck, the shapes of the output of second convolution and input to the (2) layer is mismatched. I can't get my head around it.

I achieve this using:

block_1 = tf.layers.conv1d(input,filters=512,kernel_size=5,activation=tf.nn.leaky_relu,padding="same")
block_2 = tf.layers.conv1d(block_1,filters=512,kernel_size=5,strides=2,activation=tf.nn.leaky_relu,padding="same")
IN = tf.contrib.layers.instance_norm(block_2)
RES = IN + input

Error: ValueError: Dimensions must be equal, but are 400 and 800 for 'add' (op: 'Add') with input shapes: [128,400,512], [128,800,1024].

When you run conv1d on block1 with stride = 2 , input data is halved as conv1d effectively samples only alternate numbers and also you have changed number of channels. This is usually worked around by downsampling input by 1x1 conv with stride 2 and filters 512, though I can be more specific if you can share the paper.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM