简体   繁体   English

为什么我会错过一些Python迭代?

[英]Why do I miss some iterations in Python?

I have a Parts of Speech (POS) tagged parallel corpus with 25 files in the source language in directory1 and 25 files in the target language in directory2 . 我有一个词性(POS)标记的并行语料库,在directory1有25个源语言的文件,在directory1 25个目标语言的directory2 Each file contains 1000 lines ie 25000 lines per directory. 每个文件包含1000行,即每个目录25000行。

Task at hand: I want to remove the POS tags and then write all the text in the source and target language in a single-single text file, say, source.txt & target.txt . 手头的任务:我想删除POS标签,然后将所有源语言和目标语言的文本写在一个单一的文本文件中,例如source.txttarget.txt

Fortunately, I did this (see code below) but when I run the code - sometimes source.txt or target.txt has 24896 lines or 24871 lines etc but not 25000. After running the code for 2-3 times I get 25000 lines for both files. 幸运的是,我做到了(请参见下面的代码),但是当我运行代码时-有时source.txttarget.txt包含24896行或24871行等,但不是25000。运行2-3次代码后,我得到25000行这两个文件。

Sample POS tagged input: Need\\VBN of\\IN delivery\\NN with\\IN operation\\NN .\\.

This behavior is mysterious to me (non-CS grad). 这种行为对我来说是神秘的(非CS毕业生)。 Is there any explanation for this behavior or it is just like that. 是否对此行为有任何解释,或者就是这样。

Pardon me if it is a dumb question! 如果这是一个愚蠢的问题,请原谅我!

outfile1 = open("source.txt",'w')
outfile2 = open("target.txt",'w')

path = '/somePath/'
file_names = []; tempDataSrc = []; tempDataTrg = []

for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith('.txt'):
            file_names.append(os.path.join(root, file))

file_names = sorted(file_names)

for file in file_names:  
    if ("Src_" in file): # filtering source language files
        infile1 = open(file,'r')
        for line_s in infile1:
            line_s = " ".join(word.split("\\")[0] for word in line_s.split())
            tempDataSrc.append(line_e)

for file in file_names: 
    if ("Trg_" in file): # filtering target language files
        infile2 = open(file,'r')
        for line_t in infile2:
            line_t = " ".join(word.split("\\")[0] for word in line_t.split())
            tempDataTrg.append(line_p)

for line1 in tempDataSrc:
    outfile1.write(line1+'\n')

for line2 in tempDataTrg:
    outfile2.write(line2+'\n')

NOTE: I have conda installation with python 3.6. 注意:我已经使用python 3.6安装了conda。 I am running my code in Spyder IDE; 我在Spyder IDE中运行代码; OS: Ubuntu 14.04.5 作业系统:Ubuntu 14.04.5

PS: Any suggestions for writing the code in more pythonic way are also encouraged PS:鼓励以Python方式编写代码的任何建议

I guess the behavior has to do with your environment for running the program )either the IDE or your OS itself) being abruptly killing the process, and it is not finishing to write the output to the files - as you don't close our output files in the code. 我猜这种行为与您的运行程序的环境有关(IDE或OS本身)突然终止了进程,并且还没有完成将输出写入文件的操作-因为您没有关闭输出文件中的文件。

You can fix that by simply calling the .close() method on "outfile1" and "outfile2" at the very end of your code. 您可以通过在代码末尾简单地对“ outfile1”和“ outfile2”调用.close()方法来解决此问题。

But, as you asked for input on doing things in a more Pythonic way: since you only write to the output at the end of your script, it makes sense to only "open" then near that part of the code as well. 但是,当您要求以一种更加Python的方式来执行操作时,由于您只输入脚本末尾的输出内容,因此仅“打开”然后在代码的该部分附近也很有意义。 And since we are at it, you might as well use th e with statement to create and write to both files - that will ensure all data produced is flushed to the disk saved even in the case of early termination due to other errors: 并且由于我们已经做好了准备,因此您最好使用with语句创建和写入两个文件,这将确保即使在由于其他错误而提前终止的情况下,所产生的所有数据都被刷新到保存的磁盘上:

with open("source.txt",'w') as outfile1:
    for line1 in tempDataSrc:
        outfile1.write(line1+'\n')

with open("target.txt",'w') as outfile2:
    for line2 in tempDataTrg:
        outfile2.write(line2+'\n')

(The with statement will automatically close the files and flush the data). with语句将自动关闭文件并刷新数据)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM