简体   繁体   中英

Understanding tf.nn.depthwise_conv2d

From https://www.tensorflow.org/api_docs/python/tf/nn/depthwise_conv2d

Given a 4D input tensor ('NHWC' or 'NCHW' data formats) and a filter tensor of shape [filter_height, filter_width, in_channels, channel_multiplier] containing in_channels convolutional filters of depth 1, depthwise_conv2d applies a different filter to each input channel (expanding from 1 channel to channel_multiplier channels for each), then concatenates the results together. The output has in_channels * channel_multiplier channels

  1. What does it mean "expanding from 1 channel to channel_multiplier channels for each" ?
  2. Is it possible to have out_channels < in_channels?
  3. Is it possible to divide input tensor to groups like in Pytorch https://pytorch.org/docs/stable/nn.html#conv2d ?

Example:

import tensorflow as tf
import numpy as np
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

np.random.seed(2020)

print('tf.__version__', tf.__version__)

def get_data_batch():
    bs = 2
    h = 3
    w = 3
    c = 4

    x_np = np.random.rand(bs, h, w, c)
    x_np = x_np.astype(np.float32)
    print('x_np.shape', x_np.shape)

    return x_np


def run_conv_dw():
    print('='*60)
    x_np = get_data_batch()
    in_channels = x_np.shape[-1]
    kernel_size = 3
    channel_multiplier = 1
    with tf.Session() as sess:
        x_tf = tf.convert_to_tensor(x_np)
        filter = tf.get_variable('w1', [kernel_size, kernel_size, in_channels, channel_multiplier],
                                 initializer=tf.contrib.layers.xavier_initializer())
        z_tf = tf.nn.depthwise_conv2d(x_tf, filter=filter, strides=[1, 1, 1, 1], padding='SAME')

        sess.run(tf.global_variables_initializer())
        z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
        print('z_np.shape', z_np.shape)


if '__main__' == __name__:
    run_conv_dw()

Channel multiplier can't be float:

If channel_multiplier = 1 :

x_np.shape (2, 3, 3, 4)
z_np.shape (2, 3, 3, 4)

If channel_multiplier = 2 :

x_np.shape (2, 3, 3, 4)
z_np.shape (2, 3, 3, 8)

In pytorch terms:

  1. always one input channel per group, 'channel_multiplier' output channels per group;
  2. not in one step;
  3. see 1

I see a way to emulate several input channels per group. For two, do depthwise_conv2d , then split result Tensor as deck of cards by half, and then sum acquired halves elementwise (before relu etc.). Note, that input channel number i will be grouped with i+inputs/2 one.


EDIT: Trick above is useful for small groups, for big ones just split input tensor for N parts, where N is group count, make conv2d with each independently, then concatenate results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM