简体   繁体   English

是否可以使用 python 在循环中退后一步?

[英]Is is possible step back in a loop with python?

I AM LOOKING FOR A BETTER SOLUTION TO THIS PROBLEM:我正在寻找这个问题的更好解决方案:

What I want to do is automatically concatenate words of a book that were separated by line breaks.我想要做的是自动连接由换行符分隔的书的单词。 The code I've tried is:我试过的代码是:

import nltk
from nltk.tokenize import word_tokenize
import re

with open ('Fr-dictionary.txt') as fr:  #opens the dictionary
dic = word_tokenize(fr.read().lower()) #stores the first dictionary

pat=re.compile(r'[.?\-",:;.?!»’()quls\d]+|\w+(?:-\w+)+') #pattern for 
punctuation, digits and words separated by hyphens (-)
reg= list(filter(pat.match, text))


with open ('fr-text.txt') as tx2:  #opening text containing the 
separated words
    text_input = word_tokenize(tx2.read().lower()) #stores the input 
text

words_it = iter(text_input) 

out_file1=open("finaltext.txt","w") #defining name of output file
valid_words1=[ ] #empty list to append the existing words 
invalid_words1=[ ] #empty list to append the invalid(non-existing)words 


for w in words_it: #looping through the tokenized text
    if w in dic:
        valid_words1.append(w)
    elif w in reg:
        valid_words1.append(w)#appending the valid items 
    else:
        try:
            concatenated = w + next(words_it) #concatenating strings
            if concatenated in dic:
                valid_words1.append(concatenated)#append if valid
        except StopIteration:
                   pass
        else:
           invalid_words1.append(w) #appending the invalid_words

a1=' '.join(valid_words1) #converting list into a string

out_file1.write(a1) #writing the output to a file
out_file1.close()



print(a1) #print list converted into text

print(invalid_words1)
print(len(invalid_words)

with this code I've:使用此代码,我:

a) tokenized the text (into a list) and looped throughout the list checking if each item exists in a dictionary (including punctuation) b) if not, I try to concatenate the two parts of the word, c) check if the concatenated output exists in the dictionary and, d) if so, append to the same list of the valid words, but e) if not append to another list with the invalid words. a) 将文本标记化(到列表中)并在整个列表中循环检查每个项目是否存在于字典中(包括标点符号) b) 如果不存在,我尝试连接单词的两个部分,c) 检查连接的输出存在于字典中,d) 如果存在,则附加到相同的有效词列表,但 e) 如果不存在,则附加到另一个包含无效词的列表。

PROBLEM: The problem is that sometimes the first part of the word to be concatenated is an existing/valid word (exists in the dictionary) and then the program ignores it and does not concatenate with its second part resulting in a text with these errors.问题:问题是有时要连接的单词的第一部分是一个现有/有效的单词(存在于字典中),然后程序忽略它并且不与其第二部分连接,从而导致文本出现这些错误。 ANY IDEA to resolve this problem?任何想法来解决这个问题? I think the solution could be: loop and append all the words that exist and when a non-existing word appears, the program could go back to the previous, concatenate, check in the dic and then continue...How to do that?我认为解决方案可能是:循环并附加所有存在的单词,当出现不存在的单词时,程序可以返回上一个,连接,检查 dic 然后继续......如何做到这一点?

Not sure if I got your problem but a way to resolve the problem of a step back in a loop with python is just save the last state of the loop, namely:不确定我是否遇到了您的问题,但解决使用 python 循环后退问题的一种方法是保存循环的最后状态,即:

last = None
for i in list_:
    #do stuff
    last = i

Or you can use the enumerate function或者你可以使用枚举函数

for index, i in enumerate(list_):
    #do stuff
    previous = list_[index-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM