简体   繁体   English

如果我想使用无法通过TensorFlow加载到内存中的大型数据集,我该怎么办?

[英]What should I do if I want to use large datasets that can't load into the memory with TensorFlow?

I want to use a large dataset that cannot load into the memory once to train a model with TensorFlow. 我想使用一个无法加载到内存中的大型数据集来训练TensorFlow模型。 But I don't know what exacty I should do. 但我不知道应该做些什么。

I have read some great posts about TFRecords file format and the official document. 我已经阅读了一些关于TFRecords文件格式和官方文档的精彩帖子。 Bus I still can't figure it out. 巴士我还是想不出来。

Is there a complete solution plan with TensorFlow? TensorFlow是否有完整的解决方案计划?

Consider using tf.TextLineReader which in conjunction with tf.train.string_input_producer allows you to load data from multiple files on disk (if your dataset is large enough that it needs to be spread out into multiple files). 考虑使用tf.TextLineReader ,它与tf.train.string_input_producer一起允许您从磁盘上的多个文件加载数据(如果您的数据集足够大,需要将其分散到多个文件中)。

See https://www.tensorflow.org/programmers_guide/reading_data#reading_from_files 请参阅https://www.tensorflow.org/programmers_guide/reading_data#reading_from_files

Code snippet from the link above: 上面链接中的代码段:

filename_queue = tf.train.string_input_producer(["file0.csv", "file1.csv"])

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1], [1], [1], [1], [1]]
col1, col2, col3, col4, col5 = tf.decode_csv(
    value, record_defaults=record_defaults)
features = tf.stack([col1, col2, col3, col4])

with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for     filename_queue = tf.train.string_input_producer(["file0.csv", "file1.csv"])

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1], [1], [1], [1], [1]]
col1, col2, col3, col4, col5 = tf.decode_csv(
    value, record_defaults=record_defaults)
features = tf.stack([col1, col2, col3, col4])

with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for i in range(1200):
    # Retrieve a single instance:
    example, label = sess.run([features, col5])

  coord.request_stop()
  coord.join(threads)i in range(1200):
    # Retrieve a single instance:
    example, label = sess.run([features, col5])

  coord.request_stop()
  coord.join(threads)

Normally you use a batch wise training anyways so you can load the data on the fly. 通常,您仍然使用批处理培训,以便您可以即时加载数据。 For example for images: 例如图像:

for bid in nrBatches:
     batch_x, batch_y = load_data_from_hd(bid)
     train_step.run(feed_dict={x: batch_x, y_: batch_y})

So you load every batch on the fly and only load the data which you need to load at any given moment. 因此,您可以即时加载每个批处理,只加载您在任何给定时刻需要加载的数据。 Naturally your training time will increase while using the harddisk instead of memory to load data. 当然,在使用硬盘而不是内存来加载数据时,您的训练时间会增加。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将图像数据集加载到 TensorFlow 中? - How do I load image datasets into TensorFlow? 如何在 Python 中快速加载大型数据集? - How do I make large datasets load quickly in Python? 如果我使用 python 运行系统命令,并想得到它的动态结果,我该怎么办? - If I use python to run a system command, and want to get it's dynamic result, what should i do? 我可以做些什么来帮助我的 TensorFlow.network 过度拟合大型数据集? - What can I do to help make my TensorFlow network overfit a large dataset? 我应该如何构建这个Django模型来做我想要的 - How should I build this Django model to do what I want ModuleNotFoundError:没有名为“ tensorflow”的模块。 我该怎么办? - ModuleNotFoundError: No module named 'tensorflow'. What should I do? 为什么我不能在 Windows 7 上使用 Tensorflow? - Why can't I use Tensorflow on Windows 7? 为了容纳大规模数据存储和检索我应该怎么做? - What should i do for accommodating large scale data storage and retrieval? 当我想使用pyinstaller打包我的python脚本时,如果我想在我的脚本中使用“pygame.font.SysFont”怎么办? - when I want to use pyinstaller to package my python script, what should I do if I want to use "pygame.font.SysFont" in my script? 如何更改此代码:我不想使用原始 SQL,但我不知道该怎么做 - How can I change this code : I don't want to use raw SQL but I don't know how can I do
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM