[英]Running program on multiple cores
我正在使用線程運行Python的程序來並行化任務。 任務是簡單的字符串匹配,我正在將大量的短字符串匹配到長字符串數據庫。 當我嘗試對其進行並行化時,我決定將短字符串列表划分為與內核數量相等的多個子列表,並在不同的內核上分別運行它們。 但是,當我在5個或10個內核上運行任務時,它比僅在一個內核上慢大約兩倍。 可能是什么原因造成的,我該如何解決?
編輯:我的代碼可以在下面看到
import sys
import os
import csv
import re
import threading
from Queue import Queue
from time import sleep
from threading import Lock
q_in = Queue()
q_out = Queue()
lock = Lock()
def ceil(nu):
if int(nu) == nu:
return int(nu)
else:
return int(nu) + 1
def opencsv(csvv):
with open(csvv) as csvfile:
peptides = []
reader = csv.DictReader(csvfile)
k = 0
lon = ""
for row in reader:
pept = str(row["Peptide"])
pept = re.sub("\((\+\d+\.\d+)\)", "", pept)
peptides.append(pept)
return peptides
def openfasta(fast):
with open(fast, "r") as fastafile:
dic = {}
for line in fastafile:
l = line.strip()
if l[0] == ">":
cur = l
dic[l] = ""
else:
dic[cur] = dic[cur] + l
return dic
def match(text, pattern):
text = list(text.upper())
pattern = list(pattern.upper())
ans = []
cur = 0
mis = 0
i = 0
while True:
if i == len(text):
break
if text[i] != pattern[cur]:
mis += 1
if mis > 1:
mis = 0
cur = 0
continue
cur = cur + 1
i = i + 1
if cur == len(pattern):
ans.append(i - len(pattern))
cur = 0
mis = 0
continue
return ans
def job(pepts, outfile, genes):
c = 0
it = 0
towrite = []
for i in pepts:
# if it % 1000 == 0:
# with lock:
# print float(it) / float(len(pepts))
it = it + 1
found = 0
for j in genes:
m = match(genes[j], i)
if len(m) > 0:
found = 1
remb = m[0]
wh = j
c = c + len(m)
if c > 1:
found = 0
c = 0
break
if found == 1:
towrite.append("\t".join([i, str(remb), str(wh)]) + "\n")
return towrite
def worker(outfile, genes):
s = q_in.qsize()
while True:
item = q_in.get()
print "\r{0:.2f}%".format(1 - float(q_in.qsize()) / float(s))
if item is None:
break #kill thread
pepts = item
q_out.put(job(pepts, outfile, genes))
q_in.task_done()
def main(args):
num_worker_threads = int(args[4])
pept = opencsv(args[1])
l = len(pept)
howman = num_worker_threads
ll = ceil(float(l) / float(howman * 100))
remain = pept
pepties = []
while len(remain) > 0:
pepties.append(remain[0:ll])
remain = remain[ll:]
for i in pepties:
print len(i)
print l
print "Csv file loaded..."
genes = openfasta(args[2])
out = args[3]
print "Fasta file loaded..."
threads = []
with open(out, "w") as outfile:
for pepts in pepties:
q_in.put(pepts)
for i in range(num_worker_threads):
t = threading.Thread(target=worker, args=(outfile, genes, ))
# t.daemon = True
t.start()
threads.append(t)
q_in.join() # run workers
# stop workers
for _ in range(num_worker_threads):
q_in.put(None)
for t in threads:
t.join()
# print(t)
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv))
該代碼的重要部分在工作功能內,其中肽中的短序列與基因中的長序列匹配。
這應該是由於CPython中的GIL (全局解釋器鎖定)。
在CPython中,全局解釋器鎖(即GIL)是一個互斥體,可以防止多個本機線程一次執行Python字節碼。
David Beazley在2010年PyCon上的演講對GIL進行了詳細說明。 從第32頁到第34頁,他解釋了為什么相同的多線程代碼(具有CPU約束的計算)在多核中運行時的性能可能比在單核中運行時差。
(具有單核)線程交替執行,但切換的頻率比您想象的要少
通過多個內核,可運行線程被同時調度(在不同的內核上)並在GIL上進行戰斗
David的實驗結果顯示了“隨着CPU數量的增加,線程切換如何變得更快”。
即使您的job
函數包含一些I / O,根據其三級嵌套循環( job
兩個, match
一個),它更像是受CPU限制的計算。
將代碼更改為多重處理將幫助您利用多個內核,並可能提高性能。 但是 ,您能獲得多少取決於計算量-並行化計算的收益是否可以遠遠超過諸如進程間通信之類的多重處理所帶來的開銷。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.