简体   繁体   English

TensorBoard不会更新TensorFlow Slim的标量

[英]TensorBoard doesn't update scalars for TensorFlow Slim

TensorBoard runs, and I can see plots, however the Losses do not appear to update. TensorBoard运行,我可以看到图,但是损失似乎没有更新。 I can print the batch losses at each step, I am uncertain why TensorBoard is not reflecting the losses. 我可以在每个步骤中打印批次损失,我不确定TensorBoard为什么不反映损失。

屏幕截图

I am attempting to learn about TensorFlow Slim, following this example : 我尝试通过以下示例了解TensorFlow Slim:

Attempted 尝试过

  • I attempted to add a FileWriter, even though the tutorial does not appear to have one. 我试图添加一个FileWriter,即使本教程似乎没有一个。 Does TensorFlow-Slim still require an explicit FileWriter? TensorFlow-Slim是否仍需要显式FileWriter? I see that there is a summary_writer parameter in the slim.learning.train , but is it required or does tf.summary.scalar get set as the default? 我看到在slim.learning.train中有一个summary_writer参数,但这是必需参数还是将tf.summary.scalar设置为默认值? Regardless, it does not appear to impact the graphs 无论如何,它似乎不会影响图表
  • Setting the trace_every_n_steps to various values (1, 2, 5) trace_every_n_steps设置为各种值( trace_every_n_steps
  • I have also deleted and regenerated the generated points (checkpoint, events.out.fevents. , graph.pbtxt, model.ckpt-0.data / index/meta, etc.) 我还删除并重新生成了生成的点(checkpoint,events.out.fevents。,graph.pbtxt,model.ckpt-0.data / index / meta等)

CODE

My code is largely from the Fine-tune the model on a different set of labels in the tutorial . 我的代码主要来自在教程中 的不同标签集上微调模型

Explicitly: 明确地:

import os

from preprocessing import inception_preprocessing
import numpy as np
import tensorflow as tf
from datasets import flowers
from nets import inception

from tensorflow.contrib import slim
import matplotlib.pyplot as plt


image_size = inception.inception_v1.default_image_size
checkpoints_dir = './tmp/checkpoints'
flowers_data_dir = './tmp/data/tf_records'


if not tf.gfile.Exists(checkpoints_dir):
    tf.gfile.MakeDirs(checkpoints_dir)

def load_batch(dataset, batch_size=32, height=299, width=299, is_training=False):
    """Loads a single batch of data.

    Args:
      dataset: The dataset to load.
      batch_size: The number of images in the batch.
      height: The size of each image after preprocessing.
      width: The size of each image after preprocessing.
      is_training: Whether or not we're currently training or evaluating.

    Returns:
      images: A Tensor of size [batch_size, height, width, 3], image samples that have been preprocessed.
      images_raw: A Tensor of size [batch_size, height, width, 3], image samples that can be used for visualization.
      labels: A Tensor of size [batch_size], whose values range between 0 and dataset.num_classes.
    """
    data_provider = slim.dataset_data_provider.DatasetDataProvider(
        dataset, common_queue_capacity=32,
        common_queue_min=8)
    image_raw, label = data_provider.get(['image', 'label'])

    # Preprocess image for usage by Inception.
    image = inception_preprocessing.preprocess_image(image_raw, height, width, is_training=is_training)

    # Preprocess the image for display purposes.
    image_raw = tf.expand_dims(image_raw, 0)
    image_raw = tf.image.resize_images(image_raw, [height, width])
    image_raw = tf.squeeze(image_raw)

    # Batch it up.
    images, images_raw, labels = tf.train.batch(
        [image, image_raw, label],
        batch_size=batch_size,
        num_threads=1,
        capacity=2 * batch_size)

    return images, images_raw, labels


def get_init_fn():
    """Returns a function run by the chief worker to warm-start the training."""
    checkpoint_exclude_scopes = ["InceptionV1/Logits", "InceptionV1/AuxLogits"]

    exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]

    variables_to_restore = []
    for var in slim.get_model_variables():
        excluded = False
        for exclusion in exclusions:
            if var.op.name.startswith(exclusion):
                excluded = True
                break
        if not excluded:
            variables_to_restore.append(var)

    return slim.assign_from_checkpoint_fn(
        os.path.join(checkpoints_dir, 'inception_v1.ckpt'),
        variables_to_restore)


train_dir = './tmp/inception_finetuned/'

with tf.Graph().as_default():
    tf.logging.set_verbosity(tf.logging.INFO)
    dataset = flowers.get_split('train', flowers_data_dir)

    images, _, labels = load_batch(dataset, height=image_size, width=image_size)

    # Create the model, use the default arg scope to configure the batch norm parameters.
    with slim.arg_scope(inception.inception_v1_arg_scope()):
        logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)

    # Specify the loss function:
    one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)
    slim.losses.softmax_cross_entropy(logits, one_hot_labels)
    total_loss = slim.losses.get_total_loss()

    # Create some summaries to visualize the training process:
    tf.summary.scalar('losses/Total_Loss', total_loss)

    # Specify the optimizer and create the train op:
    optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
    train_op = slim.learning.create_train_op(total_loss, optimizer)

    train_writer = tf.summary.FileWriter(train_dir)
    # train_writer.add_summary()

    # Run the training:
    final_loss = slim.learning.train(
        train_op,
        logdir=train_dir,
        init_fn=get_init_fn(),
        number_of_steps=2,
        summary_writer=train_writer,
        trace_every_n_steps=2)

print('Finished training. Last batch loss %f' % final_loss)

Related 有关

There are a large number of questions that are related to TensorBoard issues, however in these cases, TensorBoard doesn't run at all, shows nothing, or gives some kind of error. 有很多与TensorBoard问题相关的问题,但是在这些情况下,TensorBoard根本不运行,什么也不显示或给出某种错误。 In my case, TensorBoard appears to runs without errors, and I can see the losses plot generated, it doesn't appear to grab more than one value. 在我的情况下,TensorBoard运行时似乎没有错误,我可以看到生成的损失图,它似乎并没有抓住多个值。

我切换了代码并save_summaries_secs了非常不同的兔子洞,但是我认为问题是我忽略了save_summaries_secs标志-如果将其设置为高(例如默认的600秒),则需要花费一些时间来更新指标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM