如何使用Tensorflow Dataset API讀取具有不同名稱的文件而不評估文件名字符串

Question

假設我收到的文件格式為index_channel.csv csv數據集文件，其中index是示例的索引（從1到10000），而channel是channel的索引（從1到5）。 因此7_3.csv是第7個示例的第3個通道。 我想加載所有這些csv文件並連接通道以獲取正確的張量作為我的數據集。 我缺少對使我能夠做到的功能的參考。 下面是我到目前為止的代碼。 當我開始運行它時，它抱怨TypeError: expected str, bytes or os.PathLike object, not Tensor 。 我猜想它正在嘗試對表達式進行求值，而不是僅在sess.run()之后sess.run()求值，但不確定如何規避它。

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# Imports
import numpy as np
import tensorflow as tf
from tensorflow.contrib.data import Dataset, Iterator

def main(unused_argv):
  train_imgs = tf.constant(["1","2","3"]) #just trying the 3 first examples
  tr_data = Dataset.from_tensor_slices((train_imgs))
  tr_data = tr_data.map(input_parser)

  # create TensorFlow Iterator object
  iterator = Iterator.from_structure(tr_data.output_types,
                                   tr_data.output_shapes)
  next_element = iterator.get_next()
  training_init_op = iterator.make_initializer(tr_data)
  with tf.Session() as sess:

    # initialize the iterator on the training data
    sess.run(training_init_op)
    # get each element of the training dataset until the end is reached
    while True:
        try:
            elem = sess.run(next_element)
            print(elem)
        except tf.errors.OutOfRangeError:
            print("End of training dataset.")
            break

def input_parser(index):
  dic={}
  for d in range(1,6):
    a=np.loadtxt(open("./data_for_tf/" + index +"_M"+str(d)+".csv", "rb"), delimiter=",", skiprows=1)
    dic[d]=tf.convert_to_tensor(a, dtype=tf.float32)
  metric=np.stack((dic[1],dic[2],dic[3])) 
  return metric

抱歉，我是TF的新手。 我的問題看似微不足道，但通過谷歌搜索發現的所有示例均未回答我的問題。

Answer 1

在我看來，錯誤是由於在以下位置使用index而產生的：

a=np.loadtxt(open("./data_for_tf/" + index +"_M"+str(d)+".csv", "rb"), delimiter=",", skiprows=1)

您可能會懷疑，當TensorFlow設置其聲明性模型時，您的input_parser會被精確調用一次-這會設置TensorFlow操作之間的關系以供以后評估。 但是，您的Python調用（例如numpy操作）將在初始化期間立即運行。 至此， np.loadtxt正在嘗試使用尚未指定的TF op構建字符串。

如果確實如此，您甚至不需要運行模型來生成錯誤（嘗試刪除sess.run() ）。

您會在https://www.tensorflow.org/programmers_guide/datasets#preprocessing_data_with_datasetmap的示例中注意到，它們使用TF文件訪問功能讀取數據：

filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]

dataset = tf.data.Dataset.from_tensor_slices(filenames)

# Use `Dataset.flat_map()` to transform each file as a separate nested dataset,
# and then concatenate their contents sequentially into a single "flat" dataset.
# * Skip the first line (header row).
# * Filter out lines beginning with "#" (comments).

dataset = dataset.flat_map(
    lambda filename: (
        tf.data.TextLineDataset(filename)
        .skip(1)
        .filter(lambda line: tf.not_equal(tf.substr(line, 0, 1), "#"))))

它被設計為聲明性TF模型的一部分（即在運行時解析文件名）。

以下是使用TensorFlow操作讀取文件的更多示例：

https://www.tensorflow.org/get_started/datasets_quickstart#reading_a_csv_file

也可以使用命令式Python函數（請參閱第一個鏈接中的“使用tf.py_func（）應用任意Python邏輯”），盡管僅在沒有其他選擇的情況下才建議這樣做。

因此，基本上，除非使用tf.py_fun()機制，否則不能指望任何依賴TF張量或操作的常規Python操作都能按預期工作。 但是，它們可以用於循環構造以建立相互關聯的TF op。

更新：

這是一個示意圖示例：

## For a simple example, I have four files <index>_<channel>_File.txt
## so, 1_1_File.txt, 1_2_File.txt

import tensorflow as tf

def input_parser(filename):
   filesWithChannels = []

   for i in range(1,3):
       channel_data =  tf.read_file(filename+'_'+str(i)+'_File.txt')

       ## Uncomment the two lines below to add csv parsing.
       # channel_data = tf.sparse_tensor_to_dense(tf.string_split([channel_data],'\n'), default_value='')
       # channel_data = tf.decode_csv(channel_data, record_defaults=[[1.],[1.]])

       filesWithChannels.append(channel_data)

   return tf.convert_to_tensor(filesWithChannels)


train_imgs = tf.constant(["1","2"]) # e.g.
tr_data = tf.data.Dataset.from_tensor_slices(train_imgs)
tr_data = tr_data.map(input_parser)

iterator = tr_data.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    for i in range(2) :
        out = sess.run(next_element)
        print(out)

UPDATE UPDATE（添加csv）：

## For a simple example, I have four files <index>_<channel>_File.txt
## so, 1_1_File.txt, 1_2_File.txt

import tensorflow as tf

with tf.device('/cpu:0'):
    def input_parser(filename):
       filesWithChannels = []

       for i in range(1,3):
             channel_data = (tf.data.TextLineDataset(filename+'_'+str(i)+'_File.txt')
                               .map(lambda line: tf.decode_csv(line, record_defaults=[[1.],[1.]])))

             filesWithChannels.append(channel_data)

       return tf.data.Dataset.zip(tuple(filesWithChannels))

train_imgs = tf.constant(["1","2"]) # e.g.
tr_data = tf.data.Dataset.from_tensor_slices(train_imgs)
tr_data = tr_data.flat_map(input_parser)

iterator = tr_data.make_one_shot_iterator()
next_element = iterator.get_next()
next_tensor_element = tf.convert_to_tensor(next_element)

with tf.Session() as sess:
    for i in range(2) :
        out = sess.run(next_tensor_element)
        print(out)

請查看tf.decode_csv，以獲取有關如何設置字段定界符以及使用column_defaults指定列號和默認值的詳細信息。

如何使用Tensorflow Dataset API讀取具有不同名稱的文件而不評估文件名字符串

問題描述

1 個解決方案

解決方案1
2 已采納 2018-02-21 03:10:43

如何使用Tensorflow Dataset API讀取具有不同名稱的文件而不評估文件名字符串

問題描述

1 個解決方案

解決方案1 2 已采納 2018-02-21 03:10:43

解決方案1
2 已采納 2018-02-21 03:10:43