简体   繁体   English

tf.contrib.learn.Estimator类是否使用所有数据?

[英]Does the tf.contrib.learn.Estimator class use all the data?

I have a training operation : 我有一个训练手术:

def train(self, batch_size, steps):
    x, y = self.generate(batch_size*steps)
    print('Training...')
    self.classifier.fit(x=x, y=y, batch_size=batch_size, steps=steps)

And the classifier is defined here : 分类器在这里定义:

self.classifier = learn.Estimator(model_fn=self.model, model_dir=self.dir)

My question is this - if x and y are bigger in size than batch_size , do they all get used when moving along the steps ? 我的问题是-如果xy的大小大于batch_size ,则在沿着steps移动时它们是否都会被使用? For example, if the batch_size is 128, but both x and y are 128,000 items, do all items get trained on by the time steps reaches 1000 steps? 例如,如果batch_size是128,但两个xy为128000名的项目,做的所有项目获得通过的时间训练的上steps达到1000步?

I'm asking this because the generate function takes a really long time, and I want to know if most of that time is wasted if it is actually the case that only the first batch_size of examples from it are used. 我之所以这样问是因为generate函数要花很长时间,并且我想知道是否仅在其中使用了示例的第一个batch_size的情况下浪费了大部分时间。

Note: I know the x and y arguments are deprecated and I should use input_fn instead, so the question applies to both ways, for example if the training operation was this: 注意:我知道不赞成使用xy参数,而应该使用input_fn ,因此问题适用于两种方式,例如,如果训练操作是这样的:

def train(self, batch_size, steps):
    self.classifier.fit(input_fn=lambda: self.generate(batch_size*steps), steps=steps)

In other words, the input_fn function, or the function that generates x,y tensors, should it be called with a demand for batch_size*steps data examples or just batch_size , because only that would be processed anyway? 换句话说, input_fn函数或生成x,y张量的函数是否应该在需要batch_size*steps数据示例或仅batch_size情况下调用,因为无论如何只会处理该函数?

If your batch_size is 128 and if you have 128000 items then all items get trained when steps reached 1000 steps. 如果您的batch_size为128,并且您有128000个项目,那么当steps达到1000步时,所有项目都将接受训练。 The estimator only pulls what you have described in batch_size for every training step . estimator只提取您在每个training step中在batch_size描述的内容。

I have written a piece of code, which reads inputs (each sample is just 1), and every training step sums up the ones it has seen till that time, which tells you how many data samples it has read till that time. 我编写了一段代码,读取输入(每个样本只有1个),每个训练步骤都汇总了到那时为止所看到的内容,它告诉您到那时为止已经读取了多少个数据样本。

from tensorflow.contrib.learn.python.learn.estimators import model_fn as model_fn_lib

tf.logging.set_verbosity(tf.logging.INFO)

def model_fn(features, labels, mode):

   _sum = tf.Variable(0, dtype=tf.int32)   

   if mode == learn.ModeKeys.TRAIN:
       # Update global_step
       global_step=tf.contrib.framework.get_global_step()
       global_step_op = tf.assign(global_step, global_step+1)

       # Sum of all the elements in a batch
       update_sum_op = tf.assign_add(_sum, tf.reduce_sum(features)) 
       update_op = tf.group(global_step_op, update_sum_op)
       loss = _sum

   predictions = {'out': tf.identity(_sum, 'sum')}

   return model_fn_lib.ModelFnOps(mode=mode, predictions=predictions, loss=loss, train_op=update_op)


X = np.ones((1000, 1), dtype=np.int32)
y = np.ones((1000, 1), dtype=np.int32)

sess = tf.InteractiveSession()

feature_classifier = learn.SKCompat(learn.Estimator(model_fn=model_fn))
tensors_to_log = {'out':'sum'}
logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=1)
feature_classifier.fit(x=X, y=y, batch_size=123, steps=7, monitors=[logging_hook])

Here the total data samples is 1000 and the batch_size=123 and steps=7 . 在这里,总数据样本为1000, batch_size=123steps=7

The output at each step is: 每个步骤的输出为:

INFO:tensorflow:out = 123
INFO:tensorflow:out = 246 (0.004 sec)
INFO:tensorflow:out = 369 (0.003 sec)
INFO:tensorflow:out = 492 (0.003 sec)
INFO:tensorflow:out = 615 (0.003 sec)
INFO:tensorflow:out = 738 (0.003 sec)
INFO:tensorflow:out = 861 (0.003 sec)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用tf.contrib.learn.Estimator时启用XLA - Enable XLA when using tf.contrib.learn.Estimator 将input_fn用于tf.contrib.learn.Estimator时设置batch_size - setting batch_size when using input_fn for tf.contrib.learn.Estimator 如何使用tf.contrib.estimator.forward_features - How to use tf.contrib.estimator.forward_features tensorflow tf.contrib.learn.SVM如何重新加载训练后的模型并使用预测对新数据进行分类 - How tensorflow tf.contrib.learn.SVM reload trained model and use predict to classify new data 如何在tf.contrib.learn Estimator中关闭events.out.tfevents文件 - How to turn off events.out.tfevents file in tf.contrib.learn Estimator tf.contrib.learn Estimator 避免写入 events.out.tfevents.* 文件 - tf.contrib.learn Estimator avoid writing events.out.tfevents.* file 使用 `tf.contrib.predictor` 预测来自 TF 1.13 的 `tf.estimator.export_savedmodel` 的批次 - Use `tf.contrib.predictor` to predict on batches from `tf.estimator.export_savedmodel` for TF 1.13 可视化tf.contrib.learn.LinearClassifier权重 - Visualize tf.contrib.learn.LinearClassifier weights steps参数如何与tf.contrib.learn中的样本大小相关? - How does the steps parameter relate to size of samples in tf.contrib.learn? tf.contrib.learn教程弃用警告 - tf.contrib.learn tutorial deprecation warning
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM