如何加入两个 tf.data.Dataset 张量切片？

Question

I have one tensor slice with all image and one tensor with its masking image.我有一个包含所有图像的张量切片和一个包含其掩蔽图像的张量。 how do i combine/join/add them and make it a single tensor dataset tf.data.dataset我如何组合/加入/添加它们并使其成为单个张量数据集tf.data.dataset

# turning them into tensor data
val_img_data = tf.data.Dataset.from_tensor_slices(np.array(all_val_img))
val_mask_data = tf.data.Dataset.from_tensor_slices(np.array(all_val_mask))

then i mapped a function to paths to make them image然后我将 function 映射到路径以使其成为图像

val_img_tensor = val_img_data.map(get_image)
val_mask_tensor = val_mask_data.map(get_image)

So now i have two tensors one image and other mask.所以现在我有两个张量一个图像和另一个面具。 how do i join them and make it a tensor data combined?我如何加入它们并使其成为张量数据的组合？

I tried zipping them: it didn't work.我试着压缩它们：它没有用。

val_data = tf.data.Dataset.from_tensor_slices(zip(val_img_tensor, val_mask_tensor))

Error错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/util/structure.py in normalize_element(element, element_signature)
    101         if spec is None:
--> 102           spec = type_spec_from_value(t, use_fallback=False)
    103       except TypeError:

11 frames
TypeError: Could not build a `TypeSpec` for <zip object at 0x7f08f3862050> with type zip

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
    100       dtype = dtypes.as_dtype(dtype).as_datatype_enum
    101   ctx.ensure_initialized()
--> 102   return ops.EagerTensor(value, ctx.device_name, dtype)
    103 
    104 

ValueError: Attempt to convert a value (<zip object at 0x7f08f3862050>) with an unsupported type (<class 'zip'>) to a Tensor.

Answer 1

The comment of Djinn is mostly you need to follow. Djinn 的评论主要是您需要遵循。 Here is the end to end answer.这是端到端的答案。 Here is how you can build data pipeline for segmentation model training, generally a training paris with both images, masks .以下是如何为分割 model 训练构建数据管道，通常是包含两个images, masks版的训练巴黎。

First, get the sample paths.首先，获取样本路径。

images = [
        1.jpg,
        2.jpg,
        3.jpg, ...
]

masks = [
       1.png,
       2.png,
       3.png, ...
]

Second, define the hyper-params ie image size, batch size etc. And build the tf.data API input pipelines.其次，定义超参数，即图像大小、批量大小等。并构建tf.data API 输入管道。

IMAGE_SIZE = 128
BATCH_SIZE = 86

def read_image(image_path, mask=False):
    image = tf.io.read_file(image_path)
    
    if mask:
        image = tf.image.decode_png(image, channels=1)
        image.set_shape([None, None, 1])
        image = tf.image.resize(images=image, size=[IMAGE_SIZE, IMAGE_SIZE])
        image = tf.cast(image, tf.int32)
    else:
        image = tf.image.decode_png(image, channels=3)
        image.set_shape([None, None, 3])
        image = tf.image.resize(images=image, size=[IMAGE_SIZE, IMAGE_SIZE])
        image = image / 255.
        
    return image

def load_data(image_list, mask_list):
    image = read_image(image_list)
    mask  = read_image(mask_list, mask=True)
    return image, mask

def data_generator(image_list, mask_list, split='train'):
    dataset = tf.data.Dataset.from_tensor_slices((image_list, mask_list))
    dataset = dataset.shuffle(8*BATCH_SIZE) if split == 'train' else dataset 
    dataset = dataset.map(load_data, num_parallel_calls=tf.data.AUTOTUNE)
    dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
    dataset = dataset.prefetch(tf.data.AUTOTUNE)
    return dataset

Lastly, pass the list of images paths (image + mask) to build data generator.最后，传递图像路径列表（图像+遮罩）来构建数据生成器。

train_dataset = data_generator(images, masks)
image, mask = next(iter(train_dataset.take(1))) 

print(image.shape, mask.shape)
(86, 128, 128, 3) (86, 128, 128, 1)

Here you can see that, the tf.data.Dataset.from_tensor_slices successfully load the training pairs and return as tuple (no need zipping).在这里您可以看到， tf.data.Dataset.from_tensor_slices成功加载训练对并作为元组返回（无需压缩）。 Hope it will resolve your problem.希望它能解决您的问题。 I've also answered your other query regarding augmentaiton pipelines, HERE .我还在此处回答了您关于扩充管道的其他问题。 To add more, check out the following resources, I've shared plenty of semantic segmentaiton modeling approach.要添加更多内容，请查看以下资源，我已经分享了大量语义分割建模方法。 It may help.它可能会有所帮助。

Answer 2

Maybe try tf.data.Dataset.zip :也许试试tf.data.Dataset.zip ：

val_data = tf.data.Dataset.zip((val_img_tensor, val_mask_tensor))

如何加入两个 tf.data.Dataset 张量切片？

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-10-08 15:38:28

解决方案2
1 2022-10-08 06:54:40

如何加入两个 tf.data.Dataset 张量切片？

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-10-08 15:38:28

解决方案2 1 2022-10-08 06:54:40

解决方案1
2 已采纳 2022-10-08 15:38:28

解决方案2
1 2022-10-08 06:54:40