简体   繁体   中英

Can't start new thread

I'm processing all the files in a directory using multiple threads to process files in parallel. It all works fine, except that threads seem to stay alive and so the thread count of the process goes up until it reaches 1K or so threads and then it throws a thread.error can't start new thread error. I know this error is caused by an OS-level limit on thread count. I can't seem to figure out where the bug is that is keeping the threads alive. Any idea? Here is a minimal version of my code.

class Worker(Thread):
    def __init__(self, tasks):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon = True
        self.start()

def run(self):
    while True:
        func, args, kargs = self.tasks.get()
        try:
            func(*args, **kargs)
        except Exception, e: print e
        self.tasks.task_done()


class ThreadPool:
    def __init__(self, num_threads):
        self.tasks = Queue(num_threads)
        for _ in range(num_threads): Worker(self.tasks)

    def add_task(self, func, *args, **kargs):
        self.tasks.put((func, args, kargs))

    def wait_completion(self):
        self.tasks.join()


def foo(filename)
    pool = ThreadPool(32)
    iterable_data = process_file(filename)

    for data in iterable_data:
        pool.add_task(some_function, data)
    pool.wait_completion()

files = os.listdir(directory)
for file in files:
    foo(file)

You are launching a new ThreadPool with 32 threads for every file. If you have a large number of files, that would be a lot of threads. And since only one thread at a time can be executing Python bytecode in CPython (because of the Global Interpreter Lock), it is not necessarily very fast.

Move the creation of the ThreadPool outside of the foo() function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM