简体   繁体   English

为什么 TensorBoard 摘要没有更新?

[英]Why TensorBoard summary is not updating?

I use tensorboard with pytorch1.1 to log loss values.我使用带有 pytorch1.1 的 tensorboard 来记录损失值。

I use writer.add_scalar("loss", loss.item(), global_step) in every for- loop body.我在每个 for- 循环体中使用writer.add_scalar("loss", loss.item(), global_step)

However, the plotting graph does not update while the training is processing.但是,在训练处理期间绘图图不会更新。

Every time I want to see the latest loss, I have to restart the tensorboard server.每次想看到最新的损失,都得重启tensorboard服务器。

The code is here代码在这里

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torch.utils.tensorboard import SummaryWriter
from torchvision import datasets, transforms

# Writer will output to ./runs/ directory by default
writer = SummaryWriter()

transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]
)
trainset = datasets.MNIST("mnist_train", train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
model = torchvision.models.resnet50(False)
# Have ResNet model take in grayscale rather than RGB
model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
model.fc = nn.Linear(2048, 10, True)

criterion = nn.CrossEntropyLoss()

epochs = 100

opt = torch.optim.Adam(model.parameters())

niter = 0

for epoch in range(epochs):
    for step, (x, y) in enumerate(trainloader):
        yp = model(x)
        loss = criterion(yp, y)
        opt.zero_grad()
        loss.backward()
        opt.step()
        writer.add_scalar("loss", loss.item(), niter)
        niter += 1
        print(loss.item())

grid = torchvision.utils.make_grid(images)
writer.add_image("images", grid, 0)
writer.add_graph(model, images)
writer.close()

The training is still going on, and the global steps has already been 3594. However, the tensorboard still shows around 1900.训练还在继续,全局步数已经是3594了,但是tensorboard还是显示在1900左右。

在此处输入图片说明

同样对于单次运行有多个事件日志文件的人,您需要使用--reload_multifile True启动张量--reload_multifile True

There is caching done internally on the logging side.在日志记录端内部进行了缓存。 To see if that is the issue, create your SummaryWriter with要查看这是否是问题,请创建您的 SummaryWriter

writer = SummaryWriter(flush_secs=1)

and see if things update right away.看看事情是否会立即更新。 If so, feel free to tune flush_secs (defaults to 120) for your case.如果是这样,请随时为您的情况调整 flush_secs(默认为 120)。 From your description, though, this might be from the TensorBoard visualization side.不过,根据您的描述,这可能来自 TensorBoard 可视化方面。 If so, it must have something to do with the polling interval.如果是这样,它一定与轮询间隔有关。

Does installing TensorFlow (which forces TensorBoard to use a different filesystem backend) change this behavior for you?安装 TensorFlow(强制 TensorBoard 使用不同的文件系统后端)是否会为您改变这种行为?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM