Python Handling large number of Threads?

Question

# data is a list  

Threading_list=[]

class myfunction(threading.Thread):

    def __init__(self,val):
        .......
    .......

     def run(self):
        .......
        ....... 

for i in range(100000):

    t=myfunction(data[i]) # need to execute this function on every datapoint 
    t.start()
    Threading_list.append(t)

for t in Threading_list:
    t.join()

This will create around 100000 threads, but i am allowed to create a maximum of 32 threads ? What modifications can be done in this code ?

Answer 1

So many Python threads rarely need to be created. Even more, I hardly can imagine a reason for that. There are suitable architectirual patterns to solve the tasks of creating code executing in parallel that limit the number of threads. One of them is reactor .

What are you trying to do?

And remeber that, due to GIL , Python threads do not give any performance boost for computational tasks, even on multiprocessor and multiple kernel systems (BTW, can there be a 100000-kernel system? I doubt. :)). The only chance for boost is if the computational part is performed inside modules written in C/C++ that do their work without acquiring GIL. Usually Python threads are used to parallel the execution of code that contains blocking I/O operations.

UPD: Noticed the stackless-python tag. AFAIK, it supports microthreads. However, it's still unclear what are you trying to do.

And if you are trying just to process 100000 values (apply a formula to each of them?), it's better to write something like:

def myfunction(val):
    ....
    return something_calculated_from_val

results = [myfunction(d) for d in data] # you may use "map(myfunction, data)" instead

It should be much better, unless myfunction() performs some blocking I/O. If it does, ThreadPoolExecutor may really help.

Answer 2

Here is an example that will compute squares of a list of any length, using 32 threads through a ThreadPoolExecutor . As Ellioh said, you may not want to use threads in some cases, so you can easily switch to ProcessPoolExecutor .

import concurrent.futures

def my_function(x):
    return 2**x

data = [1, 6, 9, 3, 8, 4, 213, 534]

with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
    result = list(executor.map(my_function, data))

print(result)

Python Handling large number of Threads?

Question

2 answers

solution1
4 ACCPTED 2013-02-03 07:55:15

solution2
0 2013-02-03 08:04:22

Python Handling large number of Threads?

Question

2 answers

solution1 4 ACCPTED 2013-02-03 07:55:15

solution2 0 2013-02-03 08:04:22

solution1
4 ACCPTED 2013-02-03 07:55:15

solution2
0 2013-02-03 08:04:22