简体   繁体   English

迭代一个文件,但每次迭代需要5行

[英]Iterate a file but taking 5 lines each iteration

I've created 5 threads to process the lines. 我创建了5个线程来处理这些行。 I send one line as argument to each thread. 我将一行作为参数发送到每个线程。

The output is just that I need. 输出就是我需要的。 But then it stop by error. 但后来却错了。

The code: 编码:

#!usr/bin/env python3
# -*- coding: UTF-8 -*-

import threading

# Create class myThread as subclass of Thread
class MyThread(threading.Thread):
    def __init__(self, num, myArg):
        threading.Thread.__init__(self)
        self.num = num
        self.myArg = myArg

# Overrides run() method to defines the thread goes to do.
    def run(self):
        print ("I'm thread number: ", self.num)
        print(self.myArg)


myFile = open('file_01.txt', mode='r')

for myLine in myFile:
    for h in range(1, 6):    # create 5 instances of the thread
        t = MyThread(h, myLine)
        t.start()
        myLine = myFile.__next__()

myFile.close()

The error: 错误:

Traceback (most recent call last):
  File "/sajime/PycharmProjects/Learning/iterarFichero.py", line 25, in <module>
    myLine = myFile.__next__()
StopIteration

The 'file_01.txt' content is a simple 'Lorem ipsum dolor sit amet,...' stuff. 'file_01.txt'内容是一个简单的'Lorem ipsum dolor sit amet,......'的东西。

The bug isn't in the multi-threadinig class nor calls, It comes in the iteration of the file, but, why? 这个bug不在multi-threadinig类中,也不在调用中,它来自文件的迭代,但是,为什么呢?

For those who is asking why I need this: The script must process the lines to load data in web forms, and take a lot of time (lags in the server). 对于那些问我为什么需要这个的人:脚本必须处理行以在Web表单中加载数据,并花费大量时间(在服务器中滞后)。 I realized that if I divide the tasks is more faster. 我意识到,如果我将任务划分得更快。 (I don't know if there is a better method to do it) (我不知道是否有更好的方法)

Try this: 尝试这个:

for count, myLine in enumerate(myFile):
    t = MyThread(count % 5 + 1, myLine)
    t.start()

with myLine = myFile.__next__() , you advance the myFile Iterator. 使用myLine = myFile.__next__() ,你可以推进myFile Iterator。 When the iterator is fully consumed, it throws that StopIteration Exception as a signal. 当迭代器被完全消耗时,它会将StopIteration Exception作为信号StopIteration

You can catch that, and simply break the loop, since you know you're done. 你可以抓住它,简单地打破循环,因为你知道你已经完成了。

Unfortunately, there is a logic error in your program, too: you advance the iterator after every thread start, but also in the outer loop. 不幸的是,你的程序中也存在一个逻辑错误:你在每个线程启动后推进迭代器,但也在外部循环中。 That means that after you've started all Threads, the next line will be read into myLine which gets immediately overwritten by the outer loop. 这意味着在你启动所有线程之后,下一行将被读入myLine,它会被外部循环立即覆盖。

To avoid that (and to have fewer code), you can replace the whole inner and outer loop with something like 为了避免这种情况(并且代码较少),您可以用类似的东西替换整个内部和外部循环

[MyThread(i%5+1, myLine).start() for i, myLine in enumerate(myFile)]
from itertools import cycle, izip
for h, myLine in izip(cycle(range(1,6)), myFile):
    t = MyThread(h, myLine)
    t.start()

Does this do what you want? 这样做你想要的吗?

It's because you're calling the 'next' line twice in each loop. 这是因为你在每个循环中两次调用'next'行。

The for loop in your code iterates through the lines by calling next each time. 代码中的for循环通过每次调用next来遍历行。 Then you're calling it again within your loop. 然后你在循环中再次调用它。

Pull out this: 拔出这个:

myLine = myFile.__next__()

To make the final loop: 要做出最后的循环:

h=0
for myLine in myFile:
    t = MyThread((h % 6), myLine)
    t.start()
    h+=1

The % does an integer division to always ensure it fits into the thread indexes! %执行整数除法以始终确保它适合线程索引!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM