简体   繁体   中英

tf.nn.depthwise_conv2d is too slow. is it normal?

I am trying out a recent arxiv work called " Factorized CNN ",

which mainly argues that spatially separated convolution (depth-wise convolution), together with channel-wise linear projection(1x1conv), can speed up the convolution operation.

this is the figure for their conv layer architecture

I found out that I can implement this architecture with tf.nn.depthwise_conv2d and 1x1 convolution, or with tf.nn.separable_conv2d.

below is my implementation:

 #conv filter for depthwise convolution depthwise_filter = tf.get_variable("depth_conv_w", [3,3,64,1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/32))) #conv filter for linear channel projection pointwise_filter = tf.get_variable("point_conv_w", [1,1,64,64], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/1/64))) conv_b = tf.get_variable("conv_b", [64], initializer=tf.constant_initializer(0)) #depthwise convolution, with multiplier 1 conv_tensor = tf.nn.relu(tf.nn.depthwise_conv2d(tensor, depthwise_filter, [1,1,1,1], padding='SAME')) #linear channel projection with 1x1 convolution conv_tensor = tf.nn.bias_add(tf.nn.conv2d(conv_tensor, pointwise_filter, [1,1,1,1], padding='VALID'), conv_b) #residual tensor = tf.add(tensor, conv_tensor) 

This should be around 9 times faster than the original 3x3x64 -> 64 channel convolution.

However, I cannot experience any performance improvement.

I must assume that I am doing this wrong, or there's something wrong with tensorflow's implementation.

Since there is few example using depthwise_conv2d, I am leaving this question here.

Is this slow speed normal? or is there any mistake?

目前,depthwise conv2d的实现未充分利用GPU的并行功能,您需要等待将来更快的实现,例如,在caffe中,此内核https:// github中存在更快的第三方隐含功能。 com / yonghenglh6 / DepthwiseConvolution

Depthwise convolutions provide significant performance benefits owing to the reduction in both parameters and mult-adds. However, training depthwise convolution layers with GPUs is slow in current deep learning frameworks because their implementations cannot fully utilize the GPU capacity.

https://arxiv.org/pdf/1803.09926.pdf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM