在 Tensorflow 中迭代构建张量

Question

Let's say I have a function that takes a Tensor in input (of a given dimensionality) and returns another Tensor in output.假设我有一个 function 输入一个张量（具有给定维度）并返回 output 中的另一个张量。 I would like to use that function on a batch of inputs and I would like it to return a batch of outputs.我想在一批输入上使用 function，我希望它返回一批输出。 So both the input and the output would have one more dimension.所以输入和 output 都会多维。

I could write a tf.while_loop to execute my function on all the inputs in the batch, but I am unsure on how to store the output of the single elements in the batch.我可以编写一个tf.while_loop来对批处理中的所有输入执行我的 function，但我不确定如何存储批处理中单个元素的 output。 I have an Idea on how to do this that should also clarify what I am trying to do, but I am not sure it would be optimal.我有一个关于如何做到这一点的想法，它也应该澄清我想要做什么，但我不确定它是否是最佳的。

batch = tf.random.uniform([4,3,2]) #batch of size 4 of (3,2) shaped tensors
output = tf.zeros([0,5]) #let's say that the output should be a batch of 4 (4,5) shaped     tensors.
#I will concatenate the single outputs to this tensor and then reshape it
for i in tf.range(len(batch)):
 output = tf.concat((output,MyVeryNiceFunction(batch[i])),0) #MyVeryNiceFunction     returns a (4,5) shaped tensor
output = tf.reshape(output,(4,4,5)) #(batch_size,(shape of tensor))
return output

This code for sure gives the output I want, but would it allow to parallelize each execution of the loop?这段代码肯定给出了我想要的 output，但它是否允许并行化循环的每次执行？ Is there a better way to do this?有一个更好的方法吗？ Is there a proper data structure that would allow me to store the output for each loop execution, and then efficiently build the output Tensor from that?是否有适当的数据结构可以让我为每个循环执行存储 output，然后从中有效地构建 output 张量？

Answer 1

In general, iterating over a dimension is very likely to be the wrong approach.一般来说，迭代一个维度很可能是错误的方法。 In TF (and Matlab and Numpy), the goal is vectorization - describing your operations in a way that can touch all elements of the batch at the same time.在 TF（以及 Matlab 和 Numpy）中，目标是矢量化 - 以一种可以同时触及批处理的所有元素的方式描述您的操作。

For example, let's say my dataset is composed of length 2 vectors, and I have a batch of 4 of them.例如，假设我的数据集由长度为 2 的向量组成，并且我有一批 4 个。

data = tf.convert_to_tensor([[1,2], [3,4], [5,6], [7,8]], tf.float32)
>>> data
<tf.Tensor: shape=(4, 2), dtype=float32, numpy=
array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.]], dtype=float32)>

If you wanted to add an element to each vector in a vectorized way, adding some kind of statistical analysis such as variance, you'd do this.如果您想以向量化的方式向每个向量添加一个元素，添加某种统计分析（例如方差），您可以这样做。 Notice how you are constantly thinking about tensors shapes and dimensions and how to concat/append tensors.请注意您是如何不断思考张量的形状和尺寸以及如何连接/附加张量的。 It's common to document tensor shapes constantly and even assert them.经常记录张量形状甚至断言它们是很常见的。 Welcome to TF programming.欢迎来到 TF 编程。

vars = tf.math.reduce_variance(data, axis=1, keepdims=True)
tf.debugging.assert_equal(tf.shape(vars), [4, 1])
tf.concat(values=[data, vars], axis=1)


<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[1.  , 2.  , 0.25],
       [3.  , 4.  , 0.25],
       [5.  , 6.  , 0.25],
       [7.  , 8.  , 0.25]], dtype=float32)>

在 Tensorflow 中迭代构建张量

问题描述

1 个解决方案

解决方案1
1 2022-01-17 19:22:37

在 Tensorflow 中迭代构建张量

问题描述

1 个解决方案

解决方案1 1 2022-01-17 19:22:37

解决方案1
1 2022-01-17 19:22:37