tf.nn.depthwise_conv2d is too slow. is it normal?

Question

I am trying out a recent arxiv work called " Factorized CNN ",

which mainly argues that spatially separated convolution (depth-wise convolution), together with channel-wise linear projection(1x1conv), can speed up the convolution operation.

this is the figure for their conv layer architecture

I found out that I can implement this architecture with tf.nn.depthwise_conv2d and 1x1 convolution, or with tf.nn.separable_conv2d.

below is my implementation:

 #conv filter for depthwise convolution depthwise_filter = tf.get_variable("depth_conv_w", [3,3,64,1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/32))) #conv filter for linear channel projection pointwise_filter = tf.get_variable("point_conv_w", [1,1,64,64], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/1/64))) conv_b = tf.get_variable("conv_b", [64], initializer=tf.constant_initializer(0)) #depthwise convolution, with multiplier 1 conv_tensor = tf.nn.relu(tf.nn.depthwise_conv2d(tensor, depthwise_filter, [1,1,1,1], padding='SAME')) #linear channel projection with 1x1 convolution conv_tensor = tf.nn.bias_add(tf.nn.conv2d(conv_tensor, pointwise_filter, [1,1,1,1], padding='VALID'), conv_b) #residual tensor = tf.add(tensor, conv_tensor)

This should be around 9 times faster than the original 3x3x64 -> 64 channel convolution.

However, I cannot experience any performance improvement.

I must assume that I am doing this wrong, or there's something wrong with tensorflow's implementation.

Since there is few example using depthwise_conv2d, I am leaving this question here.

Is this slow speed normal? or is there any mistake?

Answer 1

目前，depthwise conv2d的实现未充分利用GPU的并行功能，您需要等待将来更快的实现，例如，在caffe中，此内核https：// github中存在更快的第三方隐含功能。 com / yonghenglh6 / DepthwiseConvolution

Answer 2

Depthwise convolutions provide significant performance benefits owing to the reduction in both parameters and mult-adds. However, training depthwise convolution layers with GPUs is slow in current deep learning frameworks because their implementations cannot fully utilize the GPU capacity.

https://arxiv.org/pdf/1803.09926.pdf

tf.nn.depthwise_conv2d is too slow. is it normal?

Question

2 answers

solution1
2 2018-01-03 14:57:23

solution2
0 2019-09-11 16:25:58

tf.nn.depthwise_conv2d is too slow. is it normal?

Question

2 answers

solution1 2 2018-01-03 14:57:23

solution2 0 2019-09-11 16:25:58

solution1
2 2018-01-03 14:57:23

solution2
0 2019-09-11 16:25:58