简体   繁体   English

读取张量流中的大数据集

[英]reading a large dataset in tensorflow

I am not quite sure about how file-queue works. 我不太确定文件队列是如何工作的。 I am trying to use a large dataset like imagenet as input. 我试图使用像imagenet这样的大型数据集作为输入。 So preloading data is not the case, so I am wondering how to use the file-queue. 所以预加载数据不是这样的,所以我想知道如何使用文件队列。 According to the tutorial, we can convert data to TFRecords file as input. 根据教程,我们可以将数据转换为TFRecords文件作为输入。 Now we have a single big TFRecords file. 现在我们有一个大的TFRecords文件。 So when we specify a FIFO queue for the reader, does it mean the program would fetch a batch of data each time and feed the graph instead of loading the whole file of data? 因此,当我们为阅读器指定FIFO队列时,是否意味着程序每次都会获取一批数据并提供图形而不是加载整个数据文件?

The amount of pre-fetching depends on your queue capacity. 预取的数量取决于您的队列容量。 If you use string_input_producer for your filenames and batch for batching, you will have 2 queues - filename queue, and prefetching queue created by batch. 如果使用string_input_producer作为文件名和batch批处理,则将有2个队列 - 文件名队列和批处理创建的预取队列。 Queue created by batch has default capacity of 32 , controlled by batch(...,capacity=) argument, therefore it can prefetch up to 32 images. batch创建的队列的默认容量为32 ,由batch(...,capacity=)参数控制,因此最多可以预取32图像。 If you follow outline in TensorFlow official howto's, processing examples (everything after batch ) will happen in main Python thread, whereas filling up the queue will happen in threads created/started by batch/start_queue_runners , so prefetching new data and running prefetched data through the network will occur concurrently, blocking when the queue gets full or empty. 如果您按照TensorFlow官方howto的大纲进行操作,处理示例( batch后的所有内容)将在主Python线程中发生,而填充队列将发生在由batch/start_queue_runners创建/启动的线程中,因此预取新数据并运行预取数据通过网络将同时发生,当队列变满或空时阻塞。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 TensorFlow-tf.data.Dataset读取大型HDF5文件 - TensorFlow - tf.data.Dataset reading large HDF5 files 在Tensorflow中使用大型数据集 - Use large dataset in Tensorflow 使用张量流解析用于回归的大数据集 - Parse large dataset for regression using tensorflow TensorFlow模型在大型数据集上花费指数时间 - TensorFlow Model takes exponential time on large dataSet 从 TensorFlow 中的 CSV 文件加载大型数据集 - Loading a large dataset from CSV files in TensorFlow 读取 Tensorflow 数据集改变了 `take()` 和 `skip()` 的行为 - Reading Tensorflow Dataset changes bahaviour of `take()` and `skip()` 在TensorFlow中读取与mnist数据集相同格式的新数据集 - Reading a new dataset in the same format as mnist dataset is read in TensorFlow 尝试输入大数据框时,Tensorflow Dataset API内存错误 - Tensorflow Dataset API memory error when trying to input large dataframe 具有许多权重的大型数据集导致Tensorflow的训练过程极其缓慢 - Large dataset with many weights causing an extremely slow training process with Tensorflow 如何使用来自大量wav文件的tensorflow.data.Dataset api创建数据集? - How should one create a dataset using the tensorflow.data.Dataset api from a large set of wav files?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM