tf.contrib.learn.Estimator类是否使用所有数据？

Question

I have a training operation : 我有一个训练手术：

def train(self, batch_size, steps):
    x, y = self.generate(batch_size*steps)
    print('Training...')
    self.classifier.fit(x=x, y=y, batch_size=batch_size, steps=steps)

And the classifier is defined here : 分类器在这里定义：

self.classifier = learn.Estimator(model_fn=self.model, model_dir=self.dir)

My question is this - if x and y are bigger in size than batch_size , do they all get used when moving along the steps ? 我的问题是-如果x和y的大小大于batch_size ，则在沿着steps移动时它们是否都会被使用？ For example, if the batch_size is 128, but both x and y are 128,000 items, do all items get trained on by the time steps reaches 1000 steps? 例如，如果batch_size是128，但两个x和y为128000名的项目，做的所有项目获得通过的时间训练的上steps达到1000步？

I'm asking this because the generate function takes a really long time, and I want to know if most of that time is wasted if it is actually the case that only the first batch_size of examples from it are used. 我之所以这样问是因为generate函数要花很长时间，并且我想知道是否仅在其中使用了示例的第一个batch_size的情况下浪费了大部分时间。

Note: I know the x and y arguments are deprecated and I should use input_fn instead, so the question applies to both ways, for example if the training operation was this: 注意：我知道不赞成使用x和y参数，而应该使用input_fn ，因此问题适用于两种方式，例如，如果训练操作是这样的：

def train(self, batch_size, steps):
    self.classifier.fit(input_fn=lambda: self.generate(batch_size*steps), steps=steps)

In other words, the input_fn function, or the function that generates x,y tensors, should it be called with a demand for batch_size*steps data examples or just batch_size , because only that would be processed anyway? 换句话说， input_fn函数或生成x，y张量的函数是否应该在需要batch_size*steps数据示例或仅batch_size情况下调用，因为无论如何只会处理该函数？

Answer 1

If your batch_size is 128 and if you have 128000 items then all items get trained when steps reached 1000 steps. 如果您的batch_size为128，并且您有128000个项目，那么当steps达到1000步时，所有项目都将接受训练。 The estimator only pulls what you have described in batch_size for every training step . estimator只提取您在每个training step中在batch_size描述的内容。

I have written a piece of code, which reads inputs (each sample is just 1), and every training step sums up the ones it has seen till that time, which tells you how many data samples it has read till that time. 我编写了一段代码，读取输入（每个样本只有1个），每个训练步骤都汇总了到那时为止所看到的内容，它告诉您到那时为止已经读取了多少个数据样本。

from tensorflow.contrib.learn.python.learn.estimators import model_fn as model_fn_lib

tf.logging.set_verbosity(tf.logging.INFO)

def model_fn(features, labels, mode):

   _sum = tf.Variable(0, dtype=tf.int32)   

   if mode == learn.ModeKeys.TRAIN:
       # Update global_step
       global_step=tf.contrib.framework.get_global_step()
       global_step_op = tf.assign(global_step, global_step+1)

       # Sum of all the elements in a batch
       update_sum_op = tf.assign_add(_sum, tf.reduce_sum(features)) 
       update_op = tf.group(global_step_op, update_sum_op)
       loss = _sum

   predictions = {'out': tf.identity(_sum, 'sum')}

   return model_fn_lib.ModelFnOps(mode=mode, predictions=predictions, loss=loss, train_op=update_op)


X = np.ones((1000, 1), dtype=np.int32)
y = np.ones((1000, 1), dtype=np.int32)

sess = tf.InteractiveSession()

feature_classifier = learn.SKCompat(learn.Estimator(model_fn=model_fn))
tensors_to_log = {'out':'sum'}
logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=1)
feature_classifier.fit(x=X, y=y, batch_size=123, steps=7, monitors=[logging_hook])

Here the total data samples is 1000 and the batch_size=123 and steps=7 . 在这里，总数据样本为1000， batch_size=123 ， steps=7 。

The output at each step is: 每个步骤的输出为：

INFO:tensorflow:out = 123
INFO:tensorflow:out = 246 (0.004 sec)
INFO:tensorflow:out = 369 (0.003 sec)
INFO:tensorflow:out = 492 (0.003 sec)
INFO:tensorflow:out = 615 (0.003 sec)
INFO:tensorflow:out = 738 (0.003 sec)
INFO:tensorflow:out = 861 (0.003 sec)

tf.contrib.learn.Estimator类是否使用所有数据？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-07-09 11:31:47

tf.contrib.learn.Estimator类是否使用所有数据？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-07-09 11:31:47

解决方案1
1 已采纳 2017-07-09 11:31:47