文本文件中的行不会遍历 for 循环 Python

Question

I am trying to iterate through my questions and lines in my.txt file.我正在尝试遍历 my.txt 文件中的问题和行。 Now this question may have been asked before, but I am really having trouble with this.现在这个问题可能已经被问过，但我真的遇到了麻烦。

this is what I have right now:这就是我现在所拥有的：

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

max_seq_length = 512

tokenizer = AutoTokenizer.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")

f = open("glad.txt", "r")

questions = [
    "Welke soorten gladiatoren waren er?",
    "Wat is een provocator?",
    "Wat voor helm droeg een retiarius?",
]
for question in questions:
    print(f"Question: {question}")
    for _ in range(len(question)):
        for line in f:
            text = str(line.split("."))
            inputs = tokenizer.encode_plus(question,
                                           text,
                                           add_special_tokens=True,
                                           max_length=100,
                                           truncation=True,
                                           return_tensors="pt")
            input_ids = inputs["input_ids"].tolist()[0]

            text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
            answer_start_scores, answer_end_scores = model(**inputs, return_dict=False)

            answer_start = torch.argmax(
                answer_start_scores
            )  # Get the most likely beginning of answer with the argmax of the score
            answer_end = torch.argmax(
                answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score

            answer = tokenizer.convert_tokens_to_string(
                tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

            print(text)
            # if answer == '[CLS]':
            #   continue
            # elif answer == '':
            #   continue
            # else:
            #   print(f"Answer: {answer}")
            #   print(f"Answer start: {answer_start}")
            #   print(f"Answer end: {answer_end}")
            #   break

and this is the output:这是 output：

> Question: Welke soorten gladiatoren waren er?
> ['Er waren vele soorten gladiatoren, maar het meest kwamen de thraex, de retiarius en de murmillo voor', ' De oudste soorten gladiatoren droegen de naam van een volk: de Samniet en de Galliër', '\n']
> ['Hun uitrusting bestond uit dezelfde wapens als die waarmee de Samnieten en Galliërs in hun oorlogen met de Romeinen gewoonlijk vochten', '\n']
> ['De Thraciër (thraex) verscheen vanaf de tweede eeuw voor Chr', ' Hij had een vrij klein  kromzwaard (sica), een klein rond (soms vierkant) schild, een helm en lage beenplaten', " De retiarius ('netvechter') had een groot net (rete) met een doorsnee van 3 m, een drietand en soms  ook een dolk", '\n']
> ['Hij had alleen bescherming om zijn linkerarm en -schouder', ' Vaak droeg hij ook een bronzen beschermingsplaat (galerus) van zijn nek tot linkerelleboog', ' Vaak vocht de retiarius tegen de secutor die om die reden ook wel contraretiarius werd genoemd', '\n']
> ['Hij had een langwerpig schild en een steekzwaard', ' Opvallend was zijn eivormige helm zonder rand en met een metalen kam, waarschijnlijk zo ontworpen om minder makkelijk in het net van de retiarius vast te haken', ' Een provocator (‘uitdager’) vocht doorgaans tegen een andere provocator', '\n']
> ['Hij viel zijn tegenstander uit een onverwachte hoek plotseling aan', ' Hij had een lang rechthoekig schild, een borstpantser, een beenplaat alleen over het linkerbeen, een helm en een kort zwaard', '']
> Question: Wat is een provocator?
> Question: Wat voor helm droeg een retiarius?

But the sentences are supposed to repeat in the other questions too.但是这些句子也应该在其他问题中重复。

Does anyone know what I am doing wrong here?有谁知道我在这里做错了什么？ It is probably something really easy, but I really don't seem the find the mistake.这可能真的很容易，但我似乎真的没有找到错误。

Answer 1

Your f is just an open file which is exhausted the first time through.你的f只是一个打开的文件，第一次用完。 I think you meant this:我想你的意思是：

f = list(open("glad.txt", "r"))

Answer 2

You would need to add f.seek(0) after your first parse through the file.您需要在第一次解析文件后添加f.seek(0) 。 This is because when you read the file once, the cursor is at the end of the file, after which for line in f does not read the file from the beginning again.这是因为当您读取一次文件时，cursor 位于文件末尾，之后for line in f不再从头读取文件。 Please refer to Tim and Nunser's answer here which explains it well.请在此处参考 Tim 和 Nunser 的回答，它很好地解释了这一点。

Something like this:像这样的东西：

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

max_seq_length = 512

tokenizer = AutoTokenizer.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")
model = AutoModelForQuestionAnswering.from_pretrained("henryk/bert-base-multilingual-cased-finetuned-dutch-squad2")

f = open("glad.txt", "r")

questions = [
    "Welke soorten gladiatoren waren er?",
    "Wat is een provocator?",
    "Wat voor helm droeg een retiarius?",
]
for question in questions:
    print(f"Question: {question}")
    for _ in range(len(question)):
        for line in f:
            text = str(line.split("."))
            inputs = tokenizer.encode_plus(question,
                                           text,
                                           add_special_tokens=True,
                                           max_length=100,
                                           truncation=True,
                                           return_tensors="pt")
            input_ids = inputs["input_ids"].tolist()[0]

            text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
            answer_start_scores, answer_end_scores = model(**inputs, return_dict=False)

            answer_start = torch.argmax(
                answer_start_scores
            )  # Get the most likely beginning of answer with the argmax of the score
            answer_end = torch.argmax(
                answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score

            answer = tokenizer.convert_tokens_to_string(
                tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

            print(text)
        f.seek(0) # reset cursor to beginning of the file

文本文件中的行不会遍历 for 循环 Python

问题描述

2 个解决方案

解决方案1
1 2020-12-19 18:30:21

解决方案2
1 已采纳 2020-12-19 18:38:35

文本文件中的行不会遍历 for 循环 Python

问题描述

2 个解决方案

解决方案1 1 2020-12-19 18:30:21

解决方案2 1 已采纳 2020-12-19 18:38:35

解决方案1
1 2020-12-19 18:30:21

解决方案2
1 已采纳 2020-12-19 18:38:35