如何访问 tf.data.Dataset.list_files() 收集的文件名？

Question

I am using我在用

file_data = tf.data.Dataset.list_files("../*.png")

to collect image files for training in TensorFlow, but would like to access the list of gathered filenames so I can perform a label lookup.在 TensorFlow 中收集用于训练的图像文件，但想访问收集的文件名列表，以便我可以执行 label 查找。

Calling sess.run([file_data]) has been unsuccessful:调用 sess.run([file_data]) 一直不成功：

TypeError: Fetch argument <TensorSliceDataset shapes: (), types: tf.string> has invalid type <class 'tensorflow.python.data.ops.dataset_ops.TensorSliceDataset'>, must be a string or Tensor. (Can not convert a TensorSliceDataset into a Tensor or Operation.)

Are there any other methods I can use?我可以使用其他方法吗？

Answer 1

With some additional experimenting, I found a way to solve this: 通过一些额外的实验，我找到了解决这个问题的方法：

First, turn the Dataset into an iterator: 首先，将数据集转换为迭代器：

iterator_helper = file_data.make_one_shot_iterator()

Then, iterate through the elements in a tf Session: 然后，迭代tf Session中的元素：

with tf.Session() as sess:
    filename_temp = iterator_helper.get_next()
    print(sess.run[filename_temp])

Answer 2

The Dataset.list_files() API uses the tf.matching_files() op to list the files matching the given pattern. Dataset.list_files() API使用tf.matching_files() op列出与给定模式匹配的文件。 You can also get the list of files as a tf.Tensor using that op, and pass it directly to sess.run() : 您还可以使用该操作将文件列表作为tf.Tensor ，并将其直接传递给sess.run() ：

filenames_as_tensor = tf.matching_files("../*.png")
filenames_as_array = sess.run(filenames_as_tensor)

for filename in filenames_as_array:
  print(filename)

Answer 3

Here is how I've done it in Tensorflow 2这是我在Tensorflow 2中的做法

def load_image_train(image_file):
  """ 
      a function to load image and return
      the image and it's address
  """
  my_image = load_image_func(image_file)    
  return my_image, image_file

Then use tf.data.Dataset.list_files to load the list of files we have in a folder:然后使用tf.data.Dataset.list_files加载文件夹中的文件列表：

PATH = "path_to_dataset_folder"
train_dataset_names = tf.data.Dataset.list_files(os.path.join(PATH , 'train/*.jpg'))

Finally map them so you can have both "file address" and "data" as a tensor:最后 map 它们这样你就可以同时拥有“文件地址”和“数据”作为张量：

train_dataset = train_dataset_names.map(load_image_train,
                         num_parallel_calls=tf.data.AUTOTUNE)

Then you can seperate them as you wish and use the file name as a label or whatever.然后您可以根据需要将它们分开，并将文件名用作 label 或其他名称。

如何访问 tf.data.Dataset.list_files() 收集的文件名？

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-07-03 20:58:04

解决方案2
2 2018-07-04 23:48:05

解决方案3
0 2022-05-02 21:12:15

如何访问 tf.data.Dataset.list_files() 收集的文件名？

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-07-03 20:58:04

解决方案2 2 2018-07-04 23:48:05

解决方案3 0 2022-05-02 21:12:15

解决方案1
3 已采纳 2018-07-03 20:58:04

解决方案2
2 2018-07-04 23:48:05

解决方案3
0 2022-05-02 21:12:15