I've developed a tool that requires the user to provide the number of CPUs available to run it.
As part of the program, the tool calls HMMER (hmmer - http://eddylab.org/software/hmmer3/3.1b2/Userguide.pdf ) which itself is quite slow and needs multiple CPUs to run.
I'm confused about the most efficient way to spread the CPUs considering how many CPUs the user has specified.
For instance, assuming the user gave N
cpus, I could run
N
HMMER jobs with 1 CPU each
N/2
jobs with 2 CPUs each
etc..
My current solution is arbitrary opening pool size of N/5 and open a pool, then to call HMMER with 5 CPUs in each process in the pool.:
pool = multiprocessing.Pool(processes = N/5)
pool.map_async(run_scan,tuple(jobs))
pool.close()
pool.join()
where run_scan
calls HMMER and jobs
holds all the command line arguments for each HMMER job as dictionaries.
The program is very slow and I was wondering if there was a better way to do this.
Thanks
Almost always, parallelization comes at some cost in efficieny, but the cost depends strongly on the specifics of the computation, so I think the only way to answer this question is a series of experiments.
(I'm assuming memory or disk I/O isn't an issue here; don't know much about HMMER, but the user's guide doesn't mention memory at all in the requirements section.)
--cpu 1
), then two cores, four, six, ..., and see how long it takes. That will give you an idea of how well the jobs get parallelized. Used CPU time = runtime * number of cores should remain constant.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.