Python - Convert threading to Thread pool?

Question

I have an existing script that runs well using threading, I but my list of things is getting bigger and bigger, and I need to limit how many threads are actually being used as I am at the point of killing my server... So I would like to add a Pool(100) to this script, but everything I have tried so far just fails out with an error code. Can anyone help add a simple pool to this? I have been looking around and alot of the pools are very complicated, and I would rather keep this as simple as possible. Please note I removed the actual "def work(item)" as this script is fairly large.

import time, os, re, threading, subprocess, sys

mylist = open('list.txt', 'r')

class working (threading.Thread):
        def __init__(self, item):
                threading.Thread.__init__(self)
                self.item = item
        def run(self):
                work(self.item)

def work(item):
        <actual work that needs to be threaded>

threads = []
for l in mylist:
        work1 = l.strip()
        thread = working(work1)
        threads.append(thread)
        thread.start()
for t in threads: t.join()
mylist.close()

Error I get when adding pools:

Process PoolWorker-10:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 232, in _bootstrap
    self.run()
  File "/usr/lib64/python2.6/multiprocessing/process.py", line 88, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 71, in worker
    put((job, i, result))
  File "/usr/lib64/python2.6/multiprocessing/queues.py", line 366, in put
    return send(obj)
UnpickleableError: Cannot pickle <type 'thread.lock'> objects

New CODE that just clears:

import time, os, re, threading, subprocess, sys
from multiprocessing.dummy import Pool as ThreadPool 

mylist = open('list.txt', 'r')

class working (threading.Thread):
        def __init__(self, item):
                threading.Thread.__init__(self)
                self.item = item
        def run(self):
                work(self.item)

def work(item):
        <actual work that needs to be threaded>

threads = []
for l in mylist:
        work1 = l.strip()
        pool = ThreadPool(10)
        pool.map(working, work1)
        pool.close()

Answer 1

Multiprocessing is a process-based high-level parallelism package. To use processes you need to be able to send data between processes, which is what the error message is telling you is not possible for some of your data (pickleable = transferable). However, if you read the module docs at:

https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy

you will find mention of something called multiprocess.dummy . Import that and you will work with the same interface but using threads instead of processes. That's what you want.

Edit :

Take the time to read the specification of the multiprocessing module. What you're doing is submitting the creation of a single thread object to the pool. What you want is to submit the work to be done and the items on which to perform the work . The (conceptually) correct solution looks like this:

def work(item):
    item = item.strip()
    <actual work that needs to be threaded>

pool = ThreadPool(10)
results = pool.map(work, mylist)
pool.close() # don't think this is strictly necessary

You don't submit Threads to the pool, but you give work to the threads contained in the pool. It's a higher level abstraction. Hope this clears things up.

Python - Convert threading to Thread pool?

Question

1 answers

solution1
1 2015-01-14 18:38:24

Python - Convert threading to Thread pool?

Question

1 answers

solution1 1 2015-01-14 18:38:24

solution1
1 2015-01-14 18:38:24