![](/img/trans.png)
[英]RuntimeError: The size of tensor a (4000) must match the size of tensor b (512) at non-singleton dimension 1
[英]The expanded size of the tensor (1011) must match the existing size (512) at non-singleton dimension 1
我从 huggingface 训练了一个 LayoutLMv2 model,当我尝试在单个图像上推断它时,它会给出运行时错误。 代码如下:
query = '/Users/vaihabsaxena/Desktop/Newfolder/labeled/Others/Two.pdf26.png'
image = Image.open(query).convert("RGB")
encoded_inputs = processor(image, return_tensors="pt").to(device)
outputs = model(**encoded_inputs)
preds = torch.softmax(outputs.logits, dim=1).tolist()[0]
pred_labels = {label:pred for label, pred in zip(label2idx.keys(), preds)}
pred_labels
当我执行model(**encoded_inputs)
时出现错误。 processor
被称为 Huggingface 的目录,并与其他 API 一起初始化如下:
feature_extractor = LayoutLMv2FeatureExtractor()
tokenizer = LayoutLMv2Tokenizer.from_pretrained("microsoft/layoutlmv2-base-uncased")
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
model 的定义和训练如下:
model = LayoutLMv2ForSequenceClassification.from_pretrained(
"microsoft/layoutlmv2-base-uncased", num_labels=len(label2idx)
)
model.to(device);
optimizer = AdamW(model.parameters(), lr=5e-5)
num_epochs = 3
for epoch in range(num_epochs):
print("Epoch:", epoch)
training_loss = 0.0
training_correct = 0
#put the model in training mode
model.train()
for batch in tqdm(train_dataloader):
outputs = model(**batch)
loss = outputs.loss
training_loss += loss.item()
predictions = outputs.logits.argmax(-1)
training_correct += (predictions == batch['labels']).float().sum()
loss.backward()
optimizer.step()
optimizer.zero_grad()
print("Training Loss:", training_loss / batch["input_ids"].shape[0])
training_accuracy = 100 * training_correct / len(train_data)
print("Training accuracy:", training_accuracy.item())
validation_loss = 0.0
validation_correct = 0
for batch in tqdm(valid_dataloader):
outputs = model(**batch)
loss = outputs.loss
validation_loss += loss.item()
predictions = outputs.logits.argmax(-1)
validation_correct += (predictions == batch['labels']).float().sum()
print("Validation Loss:", validation_loss / batch["input_ids"].shape[0])
validation_accuracy = 100 * validation_correct / len(valid_data)
print("Validation accuracy:", validation_accuracy.item())
完整的错误跟踪:
RuntimeError Traceback (most recent call last)
/Users/vaihabsaxena/Desktop/Newfolder/pytorch.ipynb Cell 37 in <cell line: 4>()
2 image = Image.open(query).convert("RGB")
3 encoded_inputs = processor(image, return_tensors="pt").to(device)
----> 4 outputs = model(**encoded_inputs)
5 preds = torch.softmax(outputs.logits, dim=1).tolist()[0]
6 pred_labels = {label:pred for label, pred in zip(label2idx.keys(), preds)}
File ~/opt/anaconda3/envs/env_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don't have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
File ~/opt/anaconda3/envs/env_pytorch/lib/python3.9/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py:1071, in LayoutLMv2ForSequenceClassification.forward(self, input_ids, bbox, image, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
1061 visual_position_ids = torch.arange(0, visual_shape[1], dtype=torch.long, device=device).repeat(
1062 input_shape[0], 1
1063 )
1065 initial_image_embeddings = self.layoutlmv2._calc_img_embeddings(
1066 image=image,
1067 bbox=visual_bbox,
...
896 input_shape[0], 1
897 )
898 final_position_ids = torch.cat([position_ids, visual_position_ids], dim=1)
RuntimeError: The expanded size of the tensor (1011) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 1011]. Tensor sizes: [1, 512]
我试图设置标记器以切断最大长度,但它发现编码输入为encoded_inputs
但图像仍然存在。 这里出了什么问题?
错误消息告诉您,通过 ocr 提取的文本(1011 个令牌)比 model 能够处理的基础文本(512 个令牌)长。 根据您的任务,您也许可以使用分词器参数截断来截断您的文本(处理器会将此参数传递给分词器):
import torch
from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2Tokenizer, LayoutLMv2Processor, LayoutLMv2ForSequenceClassification
from PIL import Image, ImageDraw, ImageFont
query = "/content/Screenshot_20220905_202551.png"
image = Image.open(query).convert("RGB")
feature_extractor = LayoutLMv2FeatureExtractor()
tokenizer = LayoutLMv2Tokenizer.from_pretrained("microsoft/layoutlmv2-base-uncased")
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
model = LayoutLMv2ForSequenceClassification.from_pretrained("microsoft/layoutlmv2-base-uncased", num_labels=2)
encoded_inputs = processor(image, return_tensors="pt")
# Model will raise an error because the tensor is longer as the trained position embeddings
print(encoded_inputs["input_ids"].shape)
encoded_inputs = processor(image, return_tensors="pt", truncation=True)
print(encoded_inputs["input_ids"].shape)
outputs = model(**encoded_inputs)
preds = torch.softmax(outputs.logits, dim=1).tolist()[0]
Output:
torch.Size([1, 644])
torch.Size([1, 512])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.