[英]how to pad sequences in a tensor slice dataset in TensorFlow?
I have a tensor slice dataset made from two ragged tensors.我有一个由两个参差不齐的张量组成的张量切片数据集。
tensor_a is like: <tf.RaggedTensor [[3, 3, 5], [3, 3, 14, 4, 17, 20], [3, 14, 22, 17]]>
张量_a 就像:
<tf.RaggedTensor [[3, 3, 5], [3, 3, 14, 4, 17, 20], [3, 14, 22, 17]]>
tensor_b is like: <tf.RaggedTensor [[-1, 1, -1], [-1, -1, 1, -1, -1, -1], [-1, 1, -1, 2]]>
tensor_b 就像:
<tf.RaggedTensor [[-1, 1, -1], [-1, -1, 1, -1, -1, -1], [-1, 1, -1, 2]]>
(Same index, same length for tensor_a and tensor_b.) (tensor_a 和 tensor_b 的索引相同,长度相同。)
I made the dataset by我制作了数据集
dataset = tf.data.Dataset.from_tensor_slices((tensor_a, tensor_b))
dataset
<TensorSliceDataset element_spec=(RaggedTensorSpec(TensorShape([None]), tf.int64, 0, tf.int64), RaggedTensorSpec(TensorShape([None]), tf.int32, 0, tf.int64))>
How to pad the sequences in my dataset?如何填充数据集中的序列? I've tried
tf.pad
and tf.keras.preprocessing.sequence.pad_sequences
but haven't found a right way.我试过
tf.pad
和tf.keras.preprocessing.sequence.pad_sequences
但没有找到正确的方法。
You could try something like this:你可以尝试这样的事情:
import tensorflow as tf
tensor_a = tf.ragged.constant([[3, 3, 5], [3, 3, 14, 4, 17, 20], [3, 14, 22, 17]])
tensor_b = tf.ragged.constant([[-1, 1, -1], [-1, -1, 1, -1, -1, -1], [-1, 1, -1, 2]])
dataset = tf.data.Dataset.from_tensor_slices((tensor_a, tensor_b))
max_length = max(list(dataset.map(lambda x, y: tf.shape(x)[0])))
def pad(x, y):
x = tf.concat([x, tf.zeros((int(max_length-tf.shape(x)[0]),), dtype=tf.int32)], axis=0)
y = tf.concat([y, tf.zeros((int(max_length-tf.shape(y)[0]),), dtype=tf.int32)], axis=0)
return x, y
dataset = dataset.map(pad)
for x, y in dataset:
print(x, y)
tf.Tensor([3 3 5 0 0 0], shape=(6,), dtype=int32) tf.Tensor([-1 1 -1 0 0 0], shape=(6,), dtype=int32)
tf.Tensor([ 3 3 14 4 17 20], shape=(6,), dtype=int32) tf.Tensor([-1 -1 1 -1 -1 -1], shape=(6,), dtype=int32)
tf.Tensor([ 3 14 22 17 0 0], shape=(6,), dtype=int32) tf.Tensor([-1 1 -1 2 0 0], shape=(6,), dtype=int32)
For pre-padding, just adjust the pad
function:对于预填充,只需调整
pad
function:
def pad(x, y):
x = tf.concat([tf.zeros((int(max_length-tf.shape(x)[0]),), dtype=tf.int32), x], axis=0)
y = tf.concat([tf.zeros((int(max_length-tf.shape(y)[0]),), dtype=tf.int32), y], axis=0)
return x, y
tf.Tensor([0 0 0 3 3 5], shape=(6,), dtype=int32) tf.Tensor([ 0 0 0 -1 1 -1], shape=(6,), dtype=int32)
tf.Tensor([ 3 3 14 4 17 20], shape=(6,), dtype=int32) tf.Tensor([-1 -1 1 -1 -1 -1], shape=(6,), dtype=int32)
tf.Tensor([ 0 0 3 14 22 17], shape=(6,), dtype=int32) tf.Tensor([ 0 0 -1 1 -1 2], shape=(6,), dtype=int32)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.