您需要什么来绘制问答 model 的结果

Question

I have been working on a question answering model, where I receive answers on my questions by my word embedding model BERT.我一直在研究回答 model 的问题，在那里我通过嵌入 model BERT 的单词收到问题的答案。 But I really want to plot something like this:但我真的很想 plot 是这样的：

But the problem is, I don't really know how.但问题是，我真的不知道怎么做。 I am really stuck at this quest.我真的被这个任务困住了。 I don't know how to represent a part of the context in a plot.我不知道如何在 plot 中表示上下文的一部分。 I do have two variables, named answer_start and answer_end which indicates in what part in the context the model got its answers from.我确实有两个变量，名为 answer_start 和 answer_end ，它们指示 model 从上下文中的哪个部分得到答案。 Can someone please help me out with this and tell me what variables I need to put in my pyplot?有人可以帮我解决这个问题并告诉我我需要在我的 pyplot 中放入哪些变量吗？

Below my code:在我的代码下面：

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
import numpy as np
import pandas as pd

max_seq_length = 512

tokenizer = AutoTokenizer.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")

questions = [
    "Welke soorten gladiatoren waren er?",
    "Wat is een provocator?"
]
for question in questions: # voor elke question moet er door alle lines geiterate worden
    print(f"Question: {question}")
    f = open("test.txt", "r")
    for line in f:
      text = str(line) #het antwoord moet een string zijn
      #encoding met tokenizen van de zinnen
      inputs = tokenizer.encode_plus(question,
                                     text,
                                     add_special_tokens=True,
                                     max_length=max_seq_length,
                                     truncation=True,
                                     return_tensors="pt")
      input_ids = inputs["input_ids"].tolist()[0]

  

      #ff uitzoeken wat die ** deed
      answer_start_scores, answer_end_scores = model(**inputs, return_dict=False)

      answer_start = torch.argmax(
          answer_start_scores
          )  # Het antwoord met de hoogste argmax accuracy vanaf het begin woord
      answer_end = torch.argmax(
          answer_end_scores) + 1  # Zelfde maar dan eind woord
      answer = tokenizer.convert_tokens_to_string(
          tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

      #om het antwoorden [cls] en NaN te voorkomen    
      if answer == '[CLS]':
        continue
      elif answer == '':
        continue
      else:
        print(f"Answer: {answer}")
        print(f"Answer start: {answer_start}")
        print(f"Answer end: {answer_end}") 
      f.seek(0)
      break          
    # f.seek(0)
    # break
  
f.close()

Also the output:还有 output：

> Question: Welke soorten gladiatoren waren er?
> Answer: de thraex, de retiarius en de murmillo
> Answer start: 24
> Answer end: 37
> Question: Wat is een provocator?
> Answer: telemachus
> Answer start: 87
> Answer end: 90

Answer 1

I don't know if I understand what your problem is.我不知道我是否明白你的问题是什么。 But to make a plot similar to that of the figure, I would do something like this:但是要制作与图中相似的 plot，我会这样做：

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

sentence = ('list' 'of' 'words' 'that' 'make' 'up' 'the' 'sentence' 'in' 'which' 'the' 'answer' 'is' 'found')
y_pos = np.arange(len(sentence))
probability = [0.1, 0.2, 0.1, 0.8, 0.6] 

plt.bar(y_pos, probability, align='center', alpha=0.5)
plt.xticks(y_pos, sentence)
plt.ylabel('Answer probability')
plt.title('Words of the sentence')

plt.show()

So assuming that the answer lies within a larger sentence/paragraph, what I would do is insert all the words of the sentence/paragraph into the x axis of a bar plot (variable sentence - text.txt I suppose), while on the y axis the percentage indicating the probability that a particular word is the beginning or ending word of the answer (variable probability ).因此，假设答案在一个更大的句子/段落中，我要做的是将句子/段落的所有单词插入条形 plot 的 x 轴（我想是可变sentence - text.txt），而在 y轴表示特定单词是答案的开头或结尾单词的概率的百分比（可变probability ）。 Obviously the two variables sentence and probability will have the same length, where the first sentence variable corresponds to the first probability value and so on.显然sentence和probability这两个变量将具有相同的长度，其中第一个句子变量对应于第一个概率值，依此类推。

For instance answer_start_scores and answer_end_scores will be the words with the highest score, therefore their "bar" of the bar plot will be the highest (highest value in the list of probability).例如answer_start_scores和answer_end_scores将是得分最高的单词，因此它们的“bar” plot 将是最高的（概率列表中的最高值）。

Finally in answer_start_scores and answer_end_scores you should have all the scores for which the starting and ending word is most likely.最后，在answer_start_scores和answer_end_scores中，您应该拥有最有可能出现开头和结尾单词的所有分数。

EDIT: Maybe, you could also make two separate bar plots for the initial word of the answer and the final word and then join them together by adding the percentages.编辑：也许，您还可以为答案的初始单词和最终单词制作两个单独的条形图，然后通过添加百分比将它们连接在一起。

您需要什么来绘制问答 model 的结果

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-04 12:21:13

您需要什么来绘制问答 model 的结果

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-04 12:21:13

解决方案1
1 已采纳 2021-01-04 12:21:13