简体   繁体   English

如何在 Tensorflow 中有效地使用 tf.bucket_by_sequence_length?

[英]How to effectively use tf.bucket_by_sequence_length in Tensorflow?

So I'm trying to use tf.bucket_by_sequence_length() from Tensorflow, but can not quite figure out how to make it work.所以我试图使用来自 Tensorflow 的 tf.bucket_by_sequence_length(),但无法弄清楚如何使它工作。

Basically, it should take sequences (of different lengths) as input and have buckets of sequences as output, but it does not seem to work this way.基本上,它应该将序列(不同长度)作为输入,并将序列桶作为输出,但它似乎不是这样工作的。

From this discussion: https://github.com/tensorflow/tensorflow/issues/5609 I have the impression that it needs a queue in order to feed this function, sequence by sequence.从这个讨论: https : //github.com/tensorflow/tensorflow/issues/5609我的印象是它需要一个队列来按顺序提供这个功能。 It's not clear though.不过还不清楚。

Function's documentation can be found here: https://www.tensorflow.org/versions/r0.12/api_docs/python/contrib.training/bucketing#bucket_by_sequence_length函数的文档可以在这里找到: https : //www.tensorflow.org/versions/r0.12/api_docs/python/contrib.training/bucketing#bucket_by_sequence_length

Indeed you need input tensor to be a queue, which can be eg a tf.FIFOQueue().deque() , or a tf.TensorArray().read(tf.train.range_input_producer()) .实际上,您需要输入张量作为队列,例如可以是tf.FIFOQueue().deque()tf.TensorArray().read(tf.train.range_input_producer())

This notebook that explains it quite well:这个笔记本很好地解释了它:

https://github.com/wcarvalho/jupyter_notebooks/blob/ebe762436e2eea1dff34bbd034898b64e4465fe4/tf.bucket_by_sequence_length/bucketing%20practice.ipynb https://github.com/wcarvalho/jupyter_notebooks/blob/ebe762436e2eea1dff34bbd034898b64e4465fe4/tf.bucket_by_sequence_length/bucketing%20practice.ipynb

My following answer is based on Tensorflow2.0.我的以下答案基于 Tensorflow2.0。 I can see that you might be using an older version of Tensorflow.我可以看到您可能正在使用旧版本的 Tensorflow。 But if you happen to use the new version, you can effectively use the bucket_by_sequence_length API in the following manner.但是如果你碰巧使用新版本,你可以通过以下方式有效地使用bucket_by_sequence_length API。

# This will be used by bucket_by_sequence_length to batch them according to their length.
def _element_length_fn(x, y=None):
    return array_ops.shape(x)[0]


# These are the upper length boundaries for the buckets.
# Based on these boundaries, the sentences will be shifted to different buckets.
boundaries = [upper_boundary_for_batch] # Here you will have to define the upper boundaries for different buckets. You can have as many boundaries as you want. But make sure that the upper boundary contains the maximum length of the sentence that is in your dataset.

# These defines the batch sizes for different buckets.
# I am keeping the batch_size for each bucket same, but this can be changed based on more analysis.
# As per the documentation - batch size per bucket. Length should be len(bucket_boundaries) + 1.
# https://www.tensorflow.org/api_docs/python/tf/data/experimental/bucket_by_sequence_length
batch_sizes = [batch_size] * (len(boundaries) + 1)

# Bucket_by_sequence_length returns a dataset transformation function that has to be applied using dataset.apply.
# Here the important parameter is pad_to_bucket_boundary. If this is set to true then, the sentences will be padded to
# the bucket boundaries provided. If set to False, it will pad the sentences to the maximum length found in the batch.
# Default value for padding is 0, so we do not need to supply anything extra here.
dataset = dataset.apply(tf.data.experimental.bucket_by_sequence_length(_element_length_fn, boundaries,
                                                                       batch_sizes,
                                                                       drop_remainder=True,
                                                                       pad_to_bucket_boundary=True))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 tf.data.experimental.bucket_by_sequence_length 创建桶,每个桶都应该填充到其对应边界的长度 - How to create buckets using tf.data.experimental.bucket_by_sequence_length, and each bucket should be padded to the length of its coresponding boundry Tensorflow Dataset.bucket_by_sequence_length 抛出 TypeError - Tensorflow Dataset.bucket_by_sequence_length throws TypeError 如何通过 tensorflow 1.14.0 有效地使用 OpenMP 并行性 - How to use OpenMP parallelism effectively with tensorflow 1.14.0 如何使用张量流函数tf.contrib.legacy_seq2seq.sequence_loss_by_example的``权重''参数? - How to use the param of 'weights' of tensorflow function tf.contrib.legacy_seq2seq.sequence_loss_by_example? 如何在 Tensorflow 2 中使用 tf.keras.utils.Sequence 和 model.fit()? - How to use tf.keras.utils.Sequence with model.fit() in Tensorflow 2? 如何将 tf.while_loop() 用于张量流中的可变长度输入? - How to use tf.while_loop() for variable-length inputs in tensorflow? 如何使用长度可变的序列解包? - How to use sequence unpacking with a sequence of variable length? 如何在 TensorFlow 2.0 中使用 tf.Lambda 和 tf.Variable - How to use tf.Lambda and tf.Variable at TensorFlow 2.0 TensorFlow 2 如何在 tf.function 中使用 *args? - TensorFlow 2 How to use *args in tf.function? 如何在tensorflow中使用tf.while_loop() - How to use tf.while_loop() in tensorflow
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM