简体   繁体   English

Tensorflow:tf.nn.conv2d在哪里实际执行?

[英]Tensorflow: Where is tf.nn.conv2d Actually Executed?

I am curious about the Tensorflow implementation of tf.nn.conv2d(...) . 我很好奇tf.nn.conv2d(...)的Tensorflow实现。 To call it, one simply runs tf.nn.conv2d(...) . 要调用它,只需运行tf.nn.conv2d(...) However, I'm going down the rabbit hole trying to see where it is executed. 然而,我正在试图看到兔子洞被执行的地方。 The code is as follows (where the arrow indicates the function it ultimately calls): 代码如下(箭头表示它最终调用的函数):

tf.nn.conv2d(...) -> tf.nn_ops.conv2d(...) -> tf.gen_nn_ops.conv2d(...) -> _op_def_lib.apply_op("Conv2D", ...) -> ?

I am familiar with Tensorflow's implementation of LSTMs and the ability to easily manipulate them as one deems fit. 我熟悉Tensorflow对LSTM的实现以及在人们认为合适的情况下轻松操作它们的能力。 Is the function that performs the conv2d() calculation written in Python, and if so, where is it? 是用于执行用Python编写的conv2d()计算的函数,如果是,它在哪里? Can I see where and how the strides are executed? 我可以看到步幅的执行位置和方式吗?

TL;DR: The implementation of tf.nn.conv2d() is written in C++, which invokes optimized code using either Eigen (on CPU) or the cuDNN library (on GPU). TL; DR: tf.nn.conv2d()的实现是用C ++编写的,它使用Eigen(在CPU上)或cuDNN库(在GPU上)调用优化代码。 You can find the implementation here . 你可以在这里找到实现。

The chain of functions that you mentioned in the question (from tf.nn.conv2d() down) are Python functions for building a TensorFlow graph, but these do not invoke the implementation. 您在问题中提到的函数链(来自tf.nn.conv2d() )是用于构建 TensorFlow图的Python函数,但这些函数不会调用实现。 Recall that, in TensorFlow, you first build a symbolic graph, then execute it . 回想一下,在TensorFlow中,首先构建一个符号图,然后执行它

The implementation of tf.nn.conv2d() is only executed happens when you call Session.run() passing a Tensor whose value depends on the result of some convolution. tf.nn.conv2d()的实现仅在调用Session.run()传递Tensor时执行, Tensor的值取决于某些卷积的结果。 For example: 例如:

input = tf.placeholder(tf.float32)
filter = tf.Variable(tf.truncated_normal([5, 5, 3, 32], stddev=0.1)
conv = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')

result = sess.run(conv, feed_dict={input: ...})  # <== Execution happens here.

Invoking sess.run(...) tells TensorFlow to run all the ops that are neeeded to compute the value of conv , including the convolution itself. 调用sess.run(...)告诉TensorFlow运行所有需要计算conv值的操作,包括卷积本身。 The path from here to the implementation is somewhat complicated, but goes through the following steps: 从这里到实现的路径有点复杂,但是要经过以下步骤:

  1. sess.run() calls the TensorFlow backend to fetch the value of conv . sess.run()调用TensorFlow后端来获取conv的值。
  2. The backend prunes the computation graph to work out what nodes must be executed, and places the nodes on the appropriate devices (CPU or GPU). 后端修剪计算图以确定必须执行哪些节点,并将节点放在适当的设备(CPU或GPU)上。
  3. Each device is instructed to execute its subgraph, using an executor . 每个设备被指示执行其子,用执行
  4. The executor eventually invokes the tensorflow::OpKernel that corresponds to the convolution operator, by calling its Compute() method. 执行程序最终通过调用其Compute()方法调用与卷积运算符对应的tensorflow::OpKernel

The "Conv2D" OpKernel is implemented here , and its Compute() method is here . 这里实现 "Conv2D" OpKernel,它的Compute()方法就在这里 Because this op is performance critical for many workloads, the implementation is quite complicated, but the basic idea is that the computation is offloaded to either the Eigen Tensor library (if running on CPU), or cuDNN's optimized GPU implementation. 由于此操作对于许多工作负载而言性能至关重要,因此实现非常复杂,但基本思想是将计算卸载到Eigen Tensor库(如果在CPU上运行)或cuDNN的优化GPU实现。

TensorFlow programs as consisting of two discrete sections: TensorFlow程序由两个独立的部分组成:

  • Building the computational graph. 构建计算图。

tf.nn.conv2d(...) -> tf.nn_ops.conv2d(...) -> tf.gen_nn_ops.conv2d(...) -> _op_def_lib.apply_op("Conv2D", ...) -> graph.create_op -> register op into graph tf.nn.conv2d(...) - > tf.nn_ops.conv2d(...) - > tf.gen_nn_ops.conv2d(...) - > _op_def_lib.apply_op(“Conv2D”,...) - > graph.create_op - >将op注册到图形中

  • Running the computational graph. 运行计算图。

sess = tf.Session(target) -> sess.run(conv2d) -> master prune full graph to client graph -> master split client graph by task to graph partition -> register graph partition to worker -> worker split subgraph by device to graph partition -> then master notify all workers to run graph partitions -> worker notify all devices to run graph partitions -> executor will run ops by topological sort on device. sess = tf.Session(target) - > sess.run(conv2d) - > master prune full graph to client graph - > master split client graph by task to graph partition - > register graph partition to worker - > worker split subgraph by device to graph partition - > then master通知所有worker运行图分区 - > worker通知所有设备运行图分区 - > executor将通过设备上的拓扑排序运行ops。

For one of op, the executor will invoke kernel implement to compute for the op. 对于其中一个,执行程序将调用内核工具来为op进行计算。

The kernel implement of tf.nn.conv2d() is written in C++, which invokes optimized code using either Eigen (on CPU) or the cuDNN library (on GPU). tf.nn.conv2d()的内核实现是用C ++编写的,它使用Eigen(在CPU上)或cuDNN库(在GPU上)调用优化代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM