[英]Avoid waiting for threads to finish in Python
I've wrote this script here to read data from a txt
file and process it. 我在这里编写了此脚本,以便从
txt
文件读取数据并进行处理。 But it seems that if I give it a big file and a high number of threads, the more it reads from the list, the slower the script gets. 但是似乎如果给它一个大文件和大量线程,则它从列表中读取的内容越多,脚本获取的速度就越慢。
Is there a way to avoid waiting for all the threads to finish and start a new one whenever a thread is done with the work? 有没有一种方法可以避免在所有线程完成工作后等待所有线程完成并开始新的线程?
Also it seems that when it finishes processing, the script doesn't exit. 同样,当它完成处理后,脚本不会退出。
import threading, Queue, time
class Work(threading.Thread):
def __init__(self, jobs):
threading.Thread.__init__(self)
self.Lock = threading.Lock()
self.jobs = jobs
def myFunction(self):
#simulate work
self.Lock.acquire()
print("Firstname: "+ self.firstname + " Lastname: "+ self.lastname)
self.Lock.release()
time.sleep(3)
def run(self):
while True:
self.item = self.jobs.get().rstrip()
self.firstname = self.item.split(":")[0]
self.lastname = self.item.split(":")[1]
self.myFunction()
self.jobs.task_done()
def main(file):
jobs = Queue.Queue()
myList = open(file, "r").readlines()
MAX_THREADS = 10
pool = [Work(jobs) for i in range(MAX_THREADS)]
for thread in pool:
thread.start()
for item in myList:
jobs.put(item)
for thread in pool:
thread.join()
if __name__ == '__main__':
main('list.txt')
The script probably seems to take longer on larger inputs because there's a 3 second pause between each batch of printing. 对于较大的输入,该脚本似乎需要更长的时间,因为每批打印之间要暂停3秒。
The issue with the script not finishing is, since you are using Queue
, you need to call join()
on the Queue
, not on the individual threads. 脚本未完成的问题是,由于使用的是
Queue
,因此需要在Queue
而不是单个线程上调用join()
。 To make sure that the script returns when the jobs have stopped running, you should also set daemon = True
. 为确保脚本在作业停止运行时返回,还应设置
daemon = True
。
The Lock
will also not work in the current code because threading.Lock()
produces a new lock each time. 该
Lock
在当前代码中也不起作用,因为threading.Lock()
每次都会生成一个新锁。 You need to have all the jobs share the same lock. 您需要使所有作业共享相同的锁。
If you want to use this in Python 3 (which you should), the Queue
module has been renamed to queue
. 如果要在Python 3中使用它(应该使用),则
Queue
模块已重命名为queue
。
import threading, Queue, time
lock = threading.Lock() # One lock
class Work(threading.Thread):
def __init__(self, jobs):
threading.Thread.__init__(self)
self.daemon = True # set daemon
self.jobs = jobs
def myFunction(self):
#simulate work
lock.acquire() # All jobs share the one lock
print("Firstname: "+ self.firstname + " Lastname: "+ self.lastname)
self.Lock.release()
time.sleep(3)
def run(self):
while True:
self.item = self.jobs.get().rstrip()
self.firstname = self.item.split(":")[0]
self.lastname = self.item.split(":")[1]
self.myFunction()
self.jobs.task_done()
def main(file):
jobs = Queue.Queue()
with open(file, 'r') as fp: # Close the file when we're done
myList = fp.readlines()
MAX_THREADS = 10
pool = [Work(jobs) for i in range(MAX_THREADS)]
for thread in pool:
thread.start()
for item in myList:
jobs.put(item)
jobs.join() # Join the Queue
if __name__ == '__main__':
main('list.txt')
Simpler example (based on an example from the Python docs ) 更简单的示例(基于Python文档中的示例)
import threading
import time
from Queue import Queue # Py2
# from queue import Queue # Py3
lock = threading.Lock()
def worker():
while True:
item = jobs.get()
if item is None:
break
firstname, lastname = item.split(':')
lock.acquire()
print("Firstname: " + firstname + " Lastname: " + lastname)
lock.release()
time.sleep(3)
jobs.task_done()
jobs = Queue()
pool = []
MAX_THREADS = 10
for i in range(MAX_THREADS):
thread = threading.Thread(target=worker)
thread.start()
pool.append(thread)
with open('list.txt') as fp:
for line in fp:
jobs.put(line.rstrip())
# block until all tasks are done
jobs.join()
# stop workers
for i in range(MAX_THREADS):
jobs.put(None)
for thread in pool:
thread.join()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.