简体   繁体   English

您需要什么来绘制问答 model 的结果

[英]What do you need for plotting the outcome of a question-answering model

I have been working on a question answering model, where I receive answers on my questions by my word embedding model BERT.我一直在研究回答 model 的问题,在那里我通过嵌入 model BERT 的单词收到问题的答案。 But I really want to plot something like this:但我真的很想 plot 是这样的: 在此处输入图像描述

But the problem is, I don't really know how.但问题是,我真的不知道怎么做。 I am really stuck at this quest.我真的被这个任务困住了。 I don't know how to represent a part of the context in a plot.我不知道如何在 plot 中表示上下文的一部分。 I do have two variables, named answer_start and answer_end which indicates in what part in the context the model got its answers from.我确实有两个变量,名为 answer_start 和 answer_end ,它们指示 model 从上下文中的哪个部分得到答案。 Can someone please help me out with this and tell me what variables I need to put in my pyplot?有人可以帮我解决这个问题并告诉我我需要在我的 pyplot 中放入哪些变量吗?

Below my code:在我的代码下面:

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
import numpy as np
import pandas as pd

max_seq_length = 512

tokenizer = AutoTokenizer.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")

questions = [
    "Welke soorten gladiatoren waren er?",
    "Wat is een provocator?"
]
for question in questions: # voor elke question moet er door alle lines geiterate worden
    print(f"Question: {question}")
    f = open("test.txt", "r")
    for line in f:
      text = str(line) #het antwoord moet een string zijn
      #encoding met tokenizen van de zinnen
      inputs = tokenizer.encode_plus(question,
                                     text,
                                     add_special_tokens=True,
                                     max_length=max_seq_length,
                                     truncation=True,
                                     return_tensors="pt")
      input_ids = inputs["input_ids"].tolist()[0]

  

      #ff uitzoeken wat die ** deed
      answer_start_scores, answer_end_scores = model(**inputs, return_dict=False)

      answer_start = torch.argmax(
          answer_start_scores
          )  # Het antwoord met de hoogste argmax accuracy vanaf het begin woord
      answer_end = torch.argmax(
          answer_end_scores) + 1  # Zelfde maar dan eind woord
      answer = tokenizer.convert_tokens_to_string(
          tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

      #om het antwoorden [cls] en NaN te voorkomen    
      if answer == '[CLS]':
        continue
      elif answer == '':
        continue
      else:
        print(f"Answer: {answer}")
        print(f"Answer start: {answer_start}")
        print(f"Answer end: {answer_end}") 
      f.seek(0)
      break          
    # f.seek(0)
    # break
  
f.close()

Also the output:还有 output:

> Question: Welke soorten gladiatoren waren er?
> Answer: de thraex, de retiarius en de murmillo
> Answer start: 24
> Answer end: 37
> Question: Wat is een provocator?
> Answer: telemachus
> Answer start: 87
> Answer end: 90

I don't know if I understand what your problem is.我不知道我是否明白你的问题是什么。 But to make a plot similar to that of the figure, I would do something like this:但是要制作与图中相似的 plot,我会这样做:

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

sentence = ('list' 'of' 'words' 'that' 'make' 'up' 'the' 'sentence' 'in' 'which' 'the' 'answer' 'is' 'found')
y_pos = np.arange(len(sentence))
probability = [0.1, 0.2, 0.1, 0.8, 0.6] 

plt.bar(y_pos, probability, align='center', alpha=0.5)
plt.xticks(y_pos, sentence)
plt.ylabel('Answer probability')
plt.title('Words of the sentence')

plt.show()

So assuming that the answer lies within a larger sentence/paragraph, what I would do is insert all the words of the sentence/paragraph into the x axis of a bar plot (variable sentence - text.txt I suppose), while on the y axis the percentage indicating the probability that a particular word is the beginning or ending word of the answer (variable probability ).因此,假设答案在一个更大的句子/段落中,我要做的是将句子/段落的所有单词插入条形 plot 的 x 轴(我想是可变sentence - text.txt),而在 y轴 表示特定单词是答案的开头或结尾单词的概率的百分比(可变probability )。 Obviously the two variables sentence and probability will have the same length, where the first sentence variable corresponds to the first probability value and so on.显然sentenceprobability这两个变量将具有相同的长度,其中第一个句子变量对应于第一个概率值,依此类推。

For instance answer_start_scores and answer_end_scores will be the words with the highest score, therefore their "bar" of the bar plot will be the highest (highest value in the list of probability).例如answer_start_scoresanswer_end_scores将是得分最高的单词,因此它们的“bar” plot 将是最高的(概率列表中的最高值)。

Finally in answer_start_scores and answer_end_scores you should have all the scores for which the starting and ending word is most likely.最后,在answer_start_scoresanswer_end_scores中,您应该拥有最有可能出现开头和结尾单词的所有分数。

EDIT: Maybe, you could also make two separate bar plots for the initial word of the answer and the final word and then join them together by adding the percentages.编辑:也许,您还可以为答案的初始单词和最终单词制作两个单独的条形图,然后通过添加百分比将它们连接在一起。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 max_length 不能修复问答 model - max_length doesn't fix the question-answering model KeyError: "Unknown task summarization, available tasks are ['feature-extraction', 'sentiment-analysis', 'ner', 'question-answering', 'fill-mask']” - KeyError: "Unknown task summarization, available tasks are ['feature-extraction', 'sentiment-analysis', 'ner', 'question-answering', 'fill-mask']" 如何将聊天机器人与预先训练的问题回答 model 集成? - How to integrate a chat bot with a pre-trained question answering model? 使用预训练的 model T5 进行问答 - Question Answering with pre-trained model T5 问答+NLP中的问题生成 - Questions generation in question answering +NLP 您是否需要存储旧数据以在sklearn中修改模型? - Do you need to store your old data to refit a model in sklearn? 为什么在Django模型中需要此方法? - Why do you need this method inside a Django model? 需要帮助使用Nginx回答POST - Need helping answering POST with nginx 如果同时需要 conda 和 pip 的包怎么办? - What to do if you need packages from both conda and pip? 在Tensorflow中,您是否需要提供与所需内容无关的值? - In Tensorflow, do you need to feed values that aren't relevant to what you need?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM