简体繁体 English

读取张量流中的大数据集

[英]reading a large dataset in tensorflow

原文 2016-01-25 23:46:13 1 1 python/ deep-learning/ tensorflow

I am not quite sure about how file-queue works. 我不太确定文件队列是如何工作的。 I am trying to use a large dataset like imagenet as input. 我试图使用像imagenet这样的大型数据集作为输入。 So preloading data is not the case, so I am wondering how to use the file-queue. 所以预加载数据不是这样的，所以我想知道如何使用文件队列。 According to the tutorial, we can convert data to TFRecords file as input. 根据教程，我们可以将数据转换为TFRecords文件作为输入。 Now we have a single big TFRecords file. 现在我们有一个大的TFRecords文件。 So when we specify a FIFO queue for the reader, does it mean the program would fetch a batch of data each time and feed the graph instead of loading the whole file of data? 因此，当我们为阅读器指定FIFO队列时，是否意味着程序每次都会获取一批数据并提供图形而不是加载整个数据文件？

1 个解决方案

The amount of pre-fetching depends on your queue capacity. 预取的数量取决于您的队列容量。 If you use string_input_producer for your filenames and batch for batching, you will have 2 queues - filename queue, and prefetching queue created by batch. 如果使用string_input_producer作为文件名和batch批处理，则将有2个队列 - 文件名队列和批处理创建的预取队列。 Queue created by batch has default capacity of 32 , controlled by batch(...,capacity=) argument, therefore it can prefetch up to 32 images. batch创建的队列的默认容量为32 ，由batch(...,capacity=)参数控制，因此最多可以预取32图像。 If you follow outline in TensorFlow official howto's, processing examples (everything after batch ) will happen in main Python thread, whereas filling up the queue will happen in threads created/started by batch/start_queue_runners , so prefetching new data and running prefetched data through the network will occur concurrently, blocking when the queue gets full or empty. 如果您按照TensorFlow官方howto的大纲进行操作，处理示例（ batch后的所有内容）将在主Python线程中发生，而填充队列将发生在由batch/start_queue_runners创建/启动的线程中，因此预取新数据并运行预取数据通过网络将同时发生，当队列变满或空时阻塞。