[英]Loss is not converging in Pytorch but does in Tensorflow
Epoch: 1 Training Loss: 0.816370 Validation Loss: 0.696534
Validation loss decreased (inf --> 0.696534). Saving model ...
Epoch: 2 Training Loss: 0.507756 Validation Loss: 0.594713
Validation loss decreased (0.696534 --> 0.594713). Saving model ...
Epoch: 3 Training Loss: 0.216438 Validation Loss: 1.119294
Epoch: 4 Training Loss: 0.191799 Validation Loss: 0.801231
Epoch: 5 Training Loss: 0.111334 Validation Loss: 1.753786
Epoch: 6 Training Loss: 0.064309 Validation Loss: 1.348847
Epoch: 7 Training Loss: 0.058158 Validation Loss: 1.839139
Epoch: 8 Training Loss: 0.015489 Validation Loss: 1.370469
Epoch: 9 Training Loss: 0.082856 Validation Loss: 1.701200
Epoch: 10 Training Loss: 0.003859 Validation Loss: 2.657933
Epoch: 11 Training Loss: 0.018133 Validation Loss: 0.593986
Validation loss decreased (0.594713 --> 0.593986). Saving model ...
Epoch: 12 Training Loss: 0.160197 Validation Loss: 1.499911
Epoch: 13 Training Loss: 0.012942 Validation Loss: 1.879732
Epoch: 14 Training Loss: 0.002037 Validation Loss: 2.399405
Epoch: 15 Training Loss: 0.035908 Validation Loss: 1.960887
Epoch: 16 Training Loss: 0.051137 Validation Loss: 2.226335
Epoch: 17 Training Loss: 0.003953 Validation Loss: 2.619108
Epoch: 18 Training Loss: 0.000381 Validation Loss: 2.746541
Epoch: 19 Training Loss: 0.094646 Validation Loss: 3.555713
Epoch: 20 Training Loss: 0.022620 Validation Loss: 2.833098
Epoch: 21 Training Loss: 0.004800 Validation Loss: 4.181845
Epoch: 22 Training Loss: 0.014128 Validation Loss: 1.933705
Epoch: 23 Training Loss: 0.026109 Validation Loss: 2.888344
Epoch: 24 Training Loss: 0.000768 Validation Loss: 3.029443
Epoch: 25 Training Loss: 0.000327 Validation Loss: 3.079959
Epoch: 26 Training Loss: 0.000121 Validation Loss: 3.578420
Epoch: 27 Training Loss: 0.148478 Validation Loss: 3.297387
Epoch: 28 Training Loss: 0.030328 Validation Loss: 2.218535
Epoch: 29 Training Loss: 0.001673 Validation Loss: 2.934132
Epoch: 30 Training Loss: 0.000253 Validation Loss: 3.215722
My loss is not converging. 我的损失没有收敛。 I am working on Horses vs humans dataset.
我正在研究“马vs人”数据集。 There is an official notebook in tensorflow for that and it worked like a charm.
为此,在tensorflow中有一个官方笔记本 ,它就像一个魅力。 When I am trying to replicate the same with pytorch, loss is not converging.
当我尝试用pytorch复制相同内容时,损失并未收敛。 Can you please have a look?
你能看看吗?
I am using criterion = nn.BCEWithLogitsLoss()
and optimizer = optim.RMSprop(model.parameters(), lr=0.001)
. 我正在使用
criterion = nn.BCEWithLogitsLoss()
和optimizer = optim.RMSprop(model.parameters(), lr=0.001)
。 Although it seems to have some effect on Training Loss, but Validation losses look like random numbers and not forming any pattern. 尽管它似乎对训练损失有一定影响,但是验证损失看起来像是随机数,没有形成任何模式。 What could be the possible reasons for loss not converging?
损失不收敛的可能原因是什么?
This is my CNN architecture: 这是我的CNN架构:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# convolutional layer (sees 298x298x3 image tensor)
self.conv1 = nn.Conv2d(3, 16, 3)
# convolutional layer (sees 147x147x16 tensor)
self.conv2 = nn.Conv2d(16, 32, 3)
# convolutional layer (sees 71x71x32 tensor)
self.conv3 = nn.Conv2d(32, 64, 3)
# convolutional layer (sees 33x33x64 tensor)
self.conv4 = nn.Conv2d(64, 64, 3)
# convolutional layer (sees 14x14x64 tensor)
self.conv5 = nn.Conv2d(64, 64, 3)
# max pooling layer
self.pool = nn.MaxPool2d(2, 2)
# linear layer (64 * 7 * 7 -> 500)
self.fc1 = nn.Linear(3136, 512)
# linear layer (512 -> 1)
self.fc2 = nn.Linear(512, 1)
# dropout layer (p=0.25)
self.dropout = nn.Dropout(0.25)
def forward(self, x):
# add sequence of convolutional and max pooling layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.pool(F.relu(self.conv4(x)))
x = self.pool(F.relu(self.conv5(x)))
# flatten image input
x = x.view(-1, 64 * 7 * 7)
# add dropout layer
x = self.dropout(x)
# add 1st hidden layer, with relu activation function
x = F.relu(self.fc1(x))
# add dropout layer
x = self.dropout(x)
# add 2nd hidden layer
x = self.fc2(x)
return x
This is the complete jupyter notebook . 这是完整的jupyter笔记本 。 Apologies for not being able to create a minimal reproduce-able example code.
抱歉无法创建最少的可复制示例代码。
I think the problem is in dataloaders
, here I noticed, that you're not passing samplers
to loaders
here: 我认为问题出在数据
dataloaders
,在这里我注意到,您没有在此处将samplers
传递给loaders
samplers
:
# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=16,
num_workers=0,
shuffle=True
)
test_loader = torch.utils.data.DataLoader(
test_dataset,
batch_size=16,
num_workers=0,
shuffle=True
)
I have never used Samplers
, so I don't now how to correctly use them, but I suppose you wanted to do smth like this: 我从未使用过
Samplers
,所以现在我不知道如何正确使用它们,但是我想您想这样做:
# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)
train_loader = torch.utils.data.DataLoader(
train_dataset,
sampler=train_sampler,
batch_size=16,
num_workers=0,
shuffle=True
)
test_loader = torch.utils.data.DataLoader(
test_dataset,
sampler=valid_sampler,
batch_size=16,
num_workers=0,
shuffle=True
)
And according to docs: 并根据文档:
sampler (Sampler, optional) – defines the strategy to draw samples from the dataset.
采样器(采样器,可选)–定义从数据集中抽取样本的策略。 If specified, shuffle must be False.
如果指定,则shuffle必须为False。
if you are using samplers you should turn off shuffle. 如果使用采样器,则应关闭随机播放。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.