[英]What do you need for plotting the outcome of a question-answering model
I have been working on a question answering model, where I receive answers on my questions by my word embedding model BERT.我一直在研究回答 model 的问题,在那里我通过嵌入 model BERT 的单词收到问题的答案。 But I really want to plot something like this:
但我真的很想 plot 是这样的:
But the problem is, I don't really know how.但问题是,我真的不知道怎么做。 I am really stuck at this quest.
我真的被这个任务困住了。 I don't know how to represent a part of the context in a plot.
我不知道如何在 plot 中表示上下文的一部分。 I do have two variables, named answer_start and answer_end which indicates in what part in the context the model got its answers from.
我确实有两个变量,名为 answer_start 和 answer_end ,它们指示 model 从上下文中的哪个部分得到答案。 Can someone please help me out with this and tell me what variables I need to put in my pyplot?
有人可以帮我解决这个问题并告诉我我需要在我的 pyplot 中放入哪些变量吗?
Below my code:在我的代码下面:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
import numpy as np
import pandas as pd
max_seq_length = 512
tokenizer = AutoTokenizer.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")
questions = [
"Welke soorten gladiatoren waren er?",
"Wat is een provocator?"
]
for question in questions: # voor elke question moet er door alle lines geiterate worden
print(f"Question: {question}")
f = open("test.txt", "r")
for line in f:
text = str(line) #het antwoord moet een string zijn
#encoding met tokenizen van de zinnen
inputs = tokenizer.encode_plus(question,
text,
add_special_tokens=True,
max_length=max_seq_length,
truncation=True,
return_tensors="pt")
input_ids = inputs["input_ids"].tolist()[0]
#ff uitzoeken wat die ** deed
answer_start_scores, answer_end_scores = model(**inputs, return_dict=False)
answer_start = torch.argmax(
answer_start_scores
) # Het antwoord met de hoogste argmax accuracy vanaf het begin woord
answer_end = torch.argmax(
answer_end_scores) + 1 # Zelfde maar dan eind woord
answer = tokenizer.convert_tokens_to_string(
tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
#om het antwoorden [cls] en NaN te voorkomen
if answer == '[CLS]':
continue
elif answer == '':
continue
else:
print(f"Answer: {answer}")
print(f"Answer start: {answer_start}")
print(f"Answer end: {answer_end}")
f.seek(0)
break
# f.seek(0)
# break
f.close()
Also the output:还有 output:
> Question: Welke soorten gladiatoren waren er?
> Answer: de thraex, de retiarius en de murmillo
> Answer start: 24
> Answer end: 37
> Question: Wat is een provocator?
> Answer: telemachus
> Answer start: 87
> Answer end: 90
I don't know if I understand what your problem is.我不知道我是否明白你的问题是什么。 But to make a plot similar to that of the figure, I would do something like this:
但是要制作与图中相似的 plot,我会这样做:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
sentence = ('list' 'of' 'words' 'that' 'make' 'up' 'the' 'sentence' 'in' 'which' 'the' 'answer' 'is' 'found')
y_pos = np.arange(len(sentence))
probability = [0.1, 0.2, 0.1, 0.8, 0.6]
plt.bar(y_pos, probability, align='center', alpha=0.5)
plt.xticks(y_pos, sentence)
plt.ylabel('Answer probability')
plt.title('Words of the sentence')
plt.show()
So assuming that the answer lies within a larger sentence/paragraph, what I would do is insert all the words of the sentence/paragraph into the x axis of a bar plot (variable sentence
- text.txt I suppose), while on the y axis the percentage indicating the probability that a particular word is the beginning or ending word of the answer (variable probability
).因此,假设答案在一个更大的句子/段落中,我要做的是将句子/段落的所有单词插入条形 plot 的 x 轴(我想是可变
sentence
- text.txt),而在 y轴 表示特定单词是答案的开头或结尾单词的概率的百分比(可变probability
)。 Obviously the two variables sentence
and probability
will have the same length, where the first sentence variable corresponds to the first probability value and so on.显然
sentence
和probability
这两个变量将具有相同的长度,其中第一个句子变量对应于第一个概率值,依此类推。
For instance answer_start_scores
and answer_end_scores
will be the words with the highest score, therefore their "bar" of the bar plot will be the highest (highest value in the list of probability).例如
answer_start_scores
和answer_end_scores
将是得分最高的单词,因此它们的“bar” plot 将是最高的(概率列表中的最高值)。
Finally in answer_start_scores
and answer_end_scores
you should have all the scores for which the starting and ending word is most likely.最后,在
answer_start_scores
和answer_end_scores
中,您应该拥有最有可能出现开头和结尾单词的所有分数。
EDIT: Maybe, you could also make two separate bar plots for the initial word of the answer and the final word and then join them together by adding the percentages.编辑:也许,您还可以为答案的初始单词和最终单词制作两个单独的条形图,然后通过添加百分比将它们连接在一起。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.