![](/img/trans.png)
[英]ValueError: only one element tensors can be converted to Python scalars
[英]In my PyTorch train iterator, how do I resolve the ValueError: only one element tensors can be converted to Python scalars?
如何解决ValueError: only one element tensors can convert to Python scalars ?
我正在密切关注在 PyTorch 中构建问答机器人的教程。 然而,在训练时,我的代码无法保存检查点,给了我前面提到的 ValueError。 错误发生在torch.save(torch.tensor(train_loss_set), os.path.join(output_dir, 'training_loss.pt'))
下面是我对应于列车迭代器的代码:
num_train_epochs = 1
print("***** Running training *****")
print(" Num examples = %d" % len(dataset))
print(" Num Epochs = %d" % num_train_epochs)
print(" Batch size = %d" % batch_size)
print(" Total optimization steps = %d" % (len(train_dataloader) // num_train_epochs))
model.zero_grad()
train_iterator = trange(num_train_epochs, desc="Epoch")
set_seed()
for _ in train_iterator:
epoch_iterator = tqdm(train_dataloader, desc="Iteration")
for step, batch in enumerate(epoch_iterator):
if step < global_step + 1:
continue
model.train()
batch = tuple(t.to(device) for t in batch)
inputs = {'input_ids': batch[0],
'attention_mask': batch[1],
'token_type_ids': batch[2],
'start_positions': batch[3],
'end_positions': batch[4]}
outputs = model(**inputs)
loss = outputs[0]
train_loss_set.append(loss)
loss.sum().backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
tr_loss += loss.sum().item()
optimizer.step()
model.zero_grad()
global_step += 1
if global_step % 1000 == 0:
print("Train loss: {}".format(tr_loss/global_step))
output_dir = 'checkpoints/checkpoint-{}'.format(global_step)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
model_to_save = model.module if hasattr(model, 'module') else model # Take care of distributed/parallel training
model_to_save.save_pretrained(output_dir)
torch.save(torch.tensor(train_loss_set), os.path.join(output_dir, 'training_loss.pt'))
print("Saving model checkpoint to %s" % output_dir)
编辑print(train_loss_set[:10])
返回以下内容:
[tensor([5.7099, 5.7395], device='cuda:0', grad_fn=<GatherBackward>), tensor([5.2470, 5.4016], device='cuda:0', grad_fn=<GatherBackward>), tensor([5.1311, 5.0390], device='cuda:0', grad_fn=<GatherBackward>), tensor([4.4326, 4.8475], device='cuda:0', grad_fn=<GatherBackward>), tensor([3.4740, 3.9955], device='cuda:0', grad_fn=<GatherBackward>), tensor([4.8710, 4.5907], device='cuda:0', grad_fn=<GatherBackward>), tensor([4.4294, 4.3013], device='cuda:0', grad_fn=<GatherBackward>), tensor([2.7536, 2.9540], device='cuda:0', grad_fn=<GatherBackward>), tensor([3.8989, 3.3436], device='cuda:0', grad_fn=<GatherBackward>), tensor([3.3534, 3.2532], device='cuda:0', grad_fn=<GatherBackward>)]
这可能与我使用 DataParallel 的事实有关吗?
这是pytorch的一个奇怪的行为。
基本上你不能使用张量列表创建张量。
但是你可以做三件事。
torch.tensor
所以这应该有效。torch.save(train_loss_set, os.path.join(output_dir, 'training_loss.pt'))
torch.stack
代替。torch.save(torch.stack(train_loss_set), os.path.join(output_dir, 'training_loss.pt'))
ndarray
。 你可以使用torch.tensor
train_loss_set.append(loss.cpu().detach().numpy())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.