在Python-3.x中使用最大CPU能力进行多处理

Question

我正在研究由32亿个字符组成的人类基因组，并且我有一个需要在此数据中进行搜索的对象列表。 像这样：

result_final=[]
objects=['obj1','obj2','obj3',...]

def function(obj):
    result_1=search_in_genome(obj)
    return(result_1)

for item in objects:
    result_2=function(item)
    result_final.append(result_2)

数据中每个对象的搜索将花费近30秒，而我有数千个对象。 我注意到在串行执行此操作时，仅使用了7％的CPU和5％的RAM。 正如我搜索的那样，为了减少计算时间，我应该使用排队，线程化或多处理进行并行计算。 但是对于非专家来说，它们似乎很复杂。 有人可以帮助我如何为python编写代码以运行10个同时搜索，是否有可能使python使用最大的可用CPU和RAM进行多处理？ （我在具有64Gb RAM，COREI7和3.5 GH CPU的Windows 7上使用Python33）

Answer 1

您可以为此使用multiprocessing模块：

from multiprocessing import Pool

objects=['obj1','obj2','obj3',...]

def function(obj):
    result_1=search_in_genome(obj)
    return(result)


if __name__ == "__main__":
    pool = Pool()
    result_final = pool.map(function, objects)

这将使您能够在计算机上所有可用CPU上扩展工作，因为进程不受GIL的影响。 您可能不想运行太多任务，而没有可用的CPU。 一旦这样做，实际上就开始放慢速度，因为CPU必须不断在进程之间切换，这会降低性能。

Answer 2

好的，我不确定您的问题，但是我会这样做（请注意，因为我不是Queue Object的专家，所以可能会有更好的解决方案）：

如果要多线程搜索：

class myThread (threading.Thread):

    def __init__(self, obj):

        threading.Thread.__init__(self)

        self.result = None

        self.obj = obj

    #Function who is called when you start your Thread
    def run(self)

        #Execute your function here
        self.result = search_in_genome(self.obj)




if __name__ == '__main__':

    result_final=[]
    objects=['obj1','obj2','obj3',...]

    #List of Thread
    listThread = []

    #Count number of potential thread
    allThread = objects.len()
    allThreadDone = 0

    for item in objects:

        #Create one thread
        thread = myThread(item)

        #Launch that Thread
        thread.start()

        #Stock it into the list
        listThread.append(thread)


    while True:

        for thread in listThread:

            #Count number of Thread who are finished
            if thread.result != None:

                #If a Thread is finished, count it
                allThreadDone += 1

        #If all thread are finished, then stop program
        if allThreadDone == allThread:
            break
        #Else initialyse flag to count again
        else:
            allThreadDone = 0

如果有人可以检查并验证此代码，那会更好。 （对不起，我的英语顺便说一句）

在Python-3.x中使用最大CPU能力进行多处理

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-07-21 14:04:14

解决方案2
0 2014-07-21 12:53:33

在Python-3.x中使用最大CPU能力进行多处理

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-07-21 14:04:14

解决方案2 0 2014-07-21 12:53:33

解决方案1
3 已采纳 2014-07-21 14:04:14

解决方案2
0 2014-07-21 12:53:33