[英]Does the tf.contrib.learn.Estimator class use all the data?
I have a training operation : 我有一个训练手术:
def train(self, batch_size, steps):
x, y = self.generate(batch_size*steps)
print('Training...')
self.classifier.fit(x=x, y=y, batch_size=batch_size, steps=steps)
And the classifier is defined here : 分类器在这里定义:
self.classifier = learn.Estimator(model_fn=self.model, model_dir=self.dir)
My question is this - if x
and y
are bigger in size than batch_size
, do they all get used when moving along the steps
? 我的问题是-如果x
和y
的大小大于batch_size
,则在沿着steps
移动时它们是否都会被使用? For example, if the batch_size
is 128, but both x
and y
are 128,000 items, do all items get trained on by the time steps
reaches 1000 steps? 例如,如果batch_size
是128,但两个x
和y
为128000名的项目,做的所有项目获得通过的时间训练的上steps
达到1000步?
I'm asking this because the generate
function takes a really long time, and I want to know if most of that time is wasted if it is actually the case that only the first batch_size
of examples from it are used. 我之所以这样问是因为generate
函数要花很长时间,并且我想知道是否仅在其中使用了示例的第一个batch_size
的情况下浪费了大部分时间。
Note: I know the x
and y
arguments are deprecated and I should use input_fn
instead, so the question applies to both ways, for example if the training operation was this: 注意:我知道不赞成使用x
和y
参数,而应该使用input_fn
,因此问题适用于两种方式,例如,如果训练操作是这样的:
def train(self, batch_size, steps):
self.classifier.fit(input_fn=lambda: self.generate(batch_size*steps), steps=steps)
In other words, the input_fn
function, or the function that generates x,y tensors, should it be called with a demand for batch_size*steps
data examples or just batch_size
, because only that would be processed anyway? 换句话说, input_fn
函数或生成x,y张量的函数是否应该在需要batch_size*steps
数据示例或仅batch_size
情况下调用,因为无论如何只会处理该函数?
If your batch_size
is 128 and if you have 128000 items then all items get trained when steps
reached 1000 steps. 如果您的batch_size
为128,并且您有128000个项目,那么当steps
达到1000步时,所有项目都将接受训练。 The estimator
only pulls what you have described in batch_size
for every training step
. estimator
只提取您在每个training step
中在batch_size
描述的内容。
I have written a piece of code, which reads inputs (each sample is just 1), and every training step sums up the ones it has seen till that time, which tells you how many data samples it has read till that time. 我编写了一段代码,读取输入(每个样本只有1个),每个训练步骤都汇总了到那时为止所看到的内容,它告诉您到那时为止已经读取了多少个数据样本。
from tensorflow.contrib.learn.python.learn.estimators import model_fn as model_fn_lib
tf.logging.set_verbosity(tf.logging.INFO)
def model_fn(features, labels, mode):
_sum = tf.Variable(0, dtype=tf.int32)
if mode == learn.ModeKeys.TRAIN:
# Update global_step
global_step=tf.contrib.framework.get_global_step()
global_step_op = tf.assign(global_step, global_step+1)
# Sum of all the elements in a batch
update_sum_op = tf.assign_add(_sum, tf.reduce_sum(features))
update_op = tf.group(global_step_op, update_sum_op)
loss = _sum
predictions = {'out': tf.identity(_sum, 'sum')}
return model_fn_lib.ModelFnOps(mode=mode, predictions=predictions, loss=loss, train_op=update_op)
X = np.ones((1000, 1), dtype=np.int32)
y = np.ones((1000, 1), dtype=np.int32)
sess = tf.InteractiveSession()
feature_classifier = learn.SKCompat(learn.Estimator(model_fn=model_fn))
tensors_to_log = {'out':'sum'}
logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=1)
feature_classifier.fit(x=X, y=y, batch_size=123, steps=7, monitors=[logging_hook])
Here the total data samples is 1000 and the batch_size=123
and steps=7
. 在这里,总数据样本为1000, batch_size=123
, steps=7
。
The output at each step is: 每个步骤的输出为:
INFO:tensorflow:out = 123
INFO:tensorflow:out = 246 (0.004 sec)
INFO:tensorflow:out = 369 (0.003 sec)
INFO:tensorflow:out = 492 (0.003 sec)
INFO:tensorflow:out = 615 (0.003 sec)
INFO:tensorflow:out = 738 (0.003 sec)
INFO:tensorflow:out = 861 (0.003 sec)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.