简体   繁体   English

在Python-3.x中使用最大CPU能力进行多处理

[英]Multiprocessing using maximum CPU power in Python-3.x

I'm working on human genome which consists of 3.2 billions of characters and i have a list of objects which need to be searched within this data. 我正在研究由32亿个字符组成的人类基因组,并且我有一个需要在此数据中进行搜索的对象列表。 Something like this: 像这样:

result_final=[]
objects=['obj1','obj2','obj3',...]

def function(obj):
    result_1=search_in_genome(obj)
    return(result_1)

for item in objects:
    result_2=function(item)
    result_final.append(result_2)

Each object's search within the data takes nearly 30 seconds and i have few thousands of objects. 数据中每个对象的搜索将花费近30秒,而我有数千个对象。 I noticed that while doing this serially just 7% of CPU and 5% of RAM is being used. 我注意到在串行执行此操作时,仅使用了7%的CPU和5%的RAM。 As i searched, for reducing the computation time i should do parallel computation using queuing , threading or multiprocessing. 正如我搜索的那样,为了减少计算时间,我应该使用排队,线程化或多处理进行并行计算。 but they seem complicated for non-experts. 但是对于非专家来说,它们似乎很复杂。 could anybody help me how i can code for python to run 10 simultaneous searches and is it possible to make python to use maximum available CPU and RAM for multiprocessing? 有人可以帮助我如何为python编写代码以运行10个同时搜索,是否有可能使python使用最大的可用CPU和RAM进行多处理? (I'm using Python33 on windows 7 with 64Gb RAM,COREI7 and 3.5 GH CPU) (我在具有64Gb RAM,COREI7和3.5 GH CPU的Windows 7上使用Python33)

You can use the multiprocessing module for this: 您可以为此使用multiprocessing模块:

from multiprocessing import Pool

objects=['obj1','obj2','obj3',...]

def function(obj):
    result_1=search_in_genome(obj)
    return(result)


if __name__ == "__main__":
    pool = Pool()
    result_final = pool.map(function, objects)

This will allow you to scale the work across all available CPUs on your machine, because processes aren't affected by the GIL. 这将使您能够在计算机上所有可用CPU上扩展工作,因为进程不受GIL的影响。 You wouldn't want to run too many more tasks than there are CPUs available. 您可能不想运行太多任务,而没有可用的CPU。 Once you do that, you actually start slowing things down, because then the CPUs have to constantly switch between processes, which has a performance penalty. 一旦这样做,实际上就开始放慢速度,因为CPU必须不断在进程之间切换,这会降低性能。

Ok I'm not sure of your question, but I would do this (Note that there may be a better solution because I'm not an expert with the Queue Object) : 好的,我不确定您的问题,但是我会这样做(请注意,因为我不是Queue Object的专家,所以可能会有更好的解决方案):

If you want to multithread your searches : 如果要多线程搜索:

class myThread (threading.Thread):

    def __init__(self, obj):

        threading.Thread.__init__(self)

        self.result = None

        self.obj = obj

    #Function who is called when you start your Thread
    def run(self)

        #Execute your function here
        self.result = search_in_genome(self.obj)




if __name__ == '__main__':

    result_final=[]
    objects=['obj1','obj2','obj3',...]

    #List of Thread
    listThread = []

    #Count number of potential thread
    allThread = objects.len()
    allThreadDone = 0

    for item in objects:

        #Create one thread
        thread = myThread(item)

        #Launch that Thread
        thread.start()

        #Stock it into the list
        listThread.append(thread)


    while True:

        for thread in listThread:

            #Count number of Thread who are finished
            if thread.result != None:

                #If a Thread is finished, count it
                allThreadDone += 1

        #If all thread are finished, then stop program
        if allThreadDone == allThread:
            break
        #Else initialyse flag to count again
        else:
            allThreadDone = 0

If someone can check and validate this code that would be better. 如果有人可以检查并验证此代码,那会更好。 (Sorry for my english btw) (对不起,我的英语顺便说一句)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM