Can't start new thread

Question

I'm processing all the files in a directory using multiple threads to process files in parallel. It all works fine, except that threads seem to stay alive and so the thread count of the process goes up until it reaches 1K or so threads and then it throws a thread.error can't start new thread error. I know this error is caused by an OS-level limit on thread count. I can't seem to figure out where the bug is that is keeping the threads alive. Any idea? Here is a minimal version of my code.

class Worker(Thread):
    def __init__(self, tasks):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon = True
        self.start()

def run(self):
    while True:
        func, args, kargs = self.tasks.get()
        try:
            func(*args, **kargs)
        except Exception, e: print e
        self.tasks.task_done()


class ThreadPool:
    def __init__(self, num_threads):
        self.tasks = Queue(num_threads)
        for _ in range(num_threads): Worker(self.tasks)

    def add_task(self, func, *args, **kargs):
        self.tasks.put((func, args, kargs))

    def wait_completion(self):
        self.tasks.join()


def foo(filename)
    pool = ThreadPool(32)
    iterable_data = process_file(filename)

    for data in iterable_data:
        pool.add_task(some_function, data)
    pool.wait_completion()

files = os.listdir(directory)
for file in files:
    foo(file)

Answer 1

You are launching a new ThreadPool with 32 threads for every file. If you have a large number of files, that would be a lot of threads. And since only one thread at a time can be executing Python bytecode in CPython (because of the Global Interpreter Lock), it is not necessarily very fast.

Move the creation of the ThreadPool outside of the foo() function.

Can't start new thread

Question

1 answers

solution1
3 ACCPTED 2013-04-30 21:51:45

Can't start new thread

Question

1 answers

solution1 3 ACCPTED 2013-04-30 21:51:45

solution1
3 ACCPTED 2013-04-30 21:51:45