I use python threading.Thread to spawn threads that execute a small utility for every filename found in os.walk() and get its output. I tried limiting number of threads using:
ThreadLimiter = threading.BoundedSemaphore(3)
and
ThreadLimiter.acquire()
in start of run method and
ThreadLimiter.release()
at end of run method
But I still get the below error message when I run the python program. Any suggestions on improving this ?
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
Use a thread pool and save yourself a lot of work! Here I md5sum files:
import os
import multiprocessing.pool
import subprocess as subp
def walker(path):
"""Walk the file system returning file names"""
for dirpath, dirs, files in os.walk(path):
for fn in files:
yield os.path.join(dirpath, fn)
def worker(filename):
"""get md5 sum of file"""
p = subp.Popen(['md5sum', filename], stdin=subp.PIPE,
stdout=subp.PIPE, stderr=subp.PIPE)
out, err = p.communicate()
return filename, p.returncode, out, err
pool = multiprocessing.pool.ThreadPool(3)
for filename, returncode, out, err in pool.imap(worker, walker('.'), chunksize=1):
print(filename, out.strip())
When run
executes, the Thread has already started. Acquiring the semaphore will block additional, keeping them alive but inactive. Using a limit inside of run
will not limit the number of running threads but of finishing threads - making the problem worse!
Either:
start
to delay launching the threads. os.walk
loop, keep a list of active threads and block using thread.join
when there are too many. multiprocessing.pool.ThreadPool
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.