简体   繁体   English

在多个内核上运行程序

[英]Running program on multiple cores

I am running a program in Python using threading to parallelise the task. 我正在使用线程运行Python的程序来并行化任务。 The task is simple string matching, I am matching a large number of short strings to a database of long strings. 任务是简单的字符串匹配,我正在将大量的短字符串匹配到长字符串数据库。 When I tried to parallelise it, I decided to split the list of short strings into a number of sublists equal to the number of cores and run each of them separately, on a different core. 当我尝试对其进行并行化时,我决定将短字符串列表划分为与内核数量相等的多个子列表,并在不同的内核上分别运行它们。 However, when I run the task on 5 or 10 cores, it is about twice slower than just on one core. 但是,当我在5个或10个内核上运行任务时,它比仅在一个内核上慢大约两倍。 What could the reason for that be and how can I possibly fix it? 可能是什么原因造成的,我该如何解决?

Edit: my code can be seen below 编辑:我的代码可以在下面看到

import sys
import os
import csv
import re
import threading
from Queue import Queue
from time import sleep
from threading import Lock


q_in = Queue()
q_out = Queue()
lock = Lock()

def ceil(nu):
    if int(nu) == nu:
        return int(nu)
    else:
        return int(nu) + 1

def opencsv(csvv):
    with open(csvv) as csvfile:
        peptides = []
        reader = csv.DictReader(csvfile)
        k = 0
        lon = ""
        for row in reader:
            pept = str(row["Peptide"])
            pept = re.sub("\((\+\d+\.\d+)\)", "", pept)
            peptides.append(pept)
        return peptides

def openfasta(fast):
    with open(fast, "r") as fastafile:
        dic = {}
        for line in fastafile:
            l = line.strip()
            if l[0] == ">":
                cur = l
                dic[l] = ""
            else:
                dic[cur] = dic[cur] + l
        return dic

def match(text, pattern):
    text = list(text.upper())
    pattern = list(pattern.upper())
    ans = []
    cur = 0
    mis = 0
    i = 0
    while True:
        if i == len(text):
            break
        if text[i] != pattern[cur]:
            mis += 1
            if mis > 1:
                mis = 0
                cur = 0
                continue
        cur = cur + 1
        i = i + 1
        if cur == len(pattern):
            ans.append(i - len(pattern))
            cur = 0
            mis = 0
            continue
    return ans

def job(pepts, outfile, genes):
    c = 0
    it = 0
    towrite = []
    for i in pepts:
        # if it % 1000 == 0:
            # with lock:
                # print float(it) / float(len(pepts))
        it = it + 1
        found = 0
        for j in genes:
            m = match(genes[j], i)
            if len(m) > 0:
                found = 1
                remb = m[0]
                wh = j
                c = c + len(m)
                if c > 1:
                    found = 0
                    c = 0
                    break
        if found == 1:
            towrite.append("\t".join([i, str(remb), str(wh)]) + "\n")
    return towrite


def worker(outfile, genes):
    s = q_in.qsize()
    while True:
        item = q_in.get()
        print "\r{0:.2f}%".format(1 - float(q_in.qsize()) / float(s))
        if item is None:
            break #kill thread
        pepts = item
        q_out.put(job(pepts, outfile, genes))
        q_in.task_done()

def main(args):
    num_worker_threads = int(args[4])

    pept = opencsv(args[1])
    l = len(pept)
    howman = num_worker_threads
    ll = ceil(float(l) / float(howman * 100))
    remain = pept
    pepties = []
    while len(remain) > 0:
        pepties.append(remain[0:ll])
        remain = remain[ll:]
    for i in pepties:
        print len(i)
    print l

    print "Csv file loaded..."
    genes = openfasta(args[2])
    out = args[3]
    print "Fasta file loaded..."

    threads = []

    with open(out, "w") as outfile:
        for pepts in pepties:
            q_in.put(pepts)

        for i in range(num_worker_threads):
            t = threading.Thread(target=worker, args=(outfile, genes, ))
            # t.daemon = True
            t.start()
            threads.append(t)

        q_in.join() # run workers

        # stop workers
        for _ in range(num_worker_threads):
            q_in.put(None)
        for t in threads:
            t.join()
            # print(t)

    return 0
if __name__ == "__main__":
  sys.exit(main(sys.argv))

The important part of the code is within the job function, where short sequences in pepts get matched to long sequences in genes. 该代码的重要部分在工作功能内,其中肽中的短序列与基因中的长序列匹配。

This should be because of GIL (Global Interpreter Lock) in CPython. 这应该是由于CPython中的GIL (全局解释器锁定)。

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. 在CPython中,全局解释器锁(即GIL)是一个互斥体,可以防止多个本机线程一次执行Python字节码。

David Beazley's presentation at PyCon 2010 gave a detailed explanation about GIL. David Beazley在2010年PyCon上的演讲对GIL进行了详细说明。 And from page 32 to page 34, he explained why the same multiple-threading code (of CPU-bound computation) could have worse performance when running with multiple cores than when running with single core. 从第32页到第34页,他解释了为什么相同的多线程代码(具有CPU约束的计算)在多核中运行时的性能可能比在单核中运行时差。

(with single core) Threads alternate execution, but switch far less frequently than you might imagine (具有单核)线程交替执行,但切换的频率比您想象的要少

With multiple cores, runnable threads get scheduled simultaneously (on different cores) and battle over the GIL 通过多个内核,可运行线程被同时调度(在不同的内核上)并在GIL上进行战斗

David's this experiment result visualizes "how thread switching gets more rapid as the number of CPUs increases". David的实验结果显示了“随着CPU数量的增加,线程切换如何变得更快”。

Even though your job function contains some I/O, according to its 3-level nested loops (two in job and one in match ), it is more like CPU-bound computation. 即使您的job函数包含一些I / O,根据其三级嵌套循环( job两个, match一个),它更像是受CPU限制的计算。

Changing your code to multiple-processing will help you utilize multiple cores and may improve the performance. 将代码更改为多重处理将帮助您利用多个内核,并可能提高性能。 However , how much you could gain depends on the quantity of the computation - whether the benefit from parallelizing the computation could far surpass the overhead introduced by multiple-processing such as inter-process communication. 但是 ,您能获得多少取决于计算量-并行化计算的收益是否可以远远超过诸如进程间通信之类的多重处理所带来的开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM