[英]Improving a nested loop for efficiency
我正在研究有关PSL文件的项目。 该程序总体上看待读对并鉴定环状分子。 我已经使该程序正常工作,但是我的操作嵌套的事实使它效率很低,读取整个PSL文件所花费的时间超过10分钟,而不是大约15秒。
相对代码为:
def readPSLpairs(self):
posread = []
negread = []
result = {}
for psl in self.readPSL():
parsed = psl.split()
strand = parsed[9][-1]
if strand == '1':
posread.append(parsed)
elif strand == '2':
negread.append(parsed)
for read in posread:
posname = read[9][:-2]
poscontig = read[13]
for read in negread:
negname = read[9][:-2]
negcontig = read[13]
if posname == negname and poscontig == negcontig:
try:
result[poscontig] += 1
break
except:
result[poscontig] = 1
break
print(result)
我尝试更改整体操作,而不是将值附加到列表中,然后尝试匹配posname = negname和poscontig = negcontig,但是事实证明,这比我想象的要难得多,因此我坚持尝试改进这一切的功能。
import collections
all_dict = {"pos": collections.defaultdict(int),
"neg": collections.defaultdict(int)}
result = {}
for psl in self.readPSL():
parsed = pls.split()
strand = "pos" if parsed[9][-1]=='1' else "neg"
name, contig = parsed[9][:-2], parsed[13]
all_dict[strand][(name,contig)] += 1
# pre-process all the psl's into all_dict['pos'] or all_dict['neg']
# this is basically just a `collections.Counter` of what you're doing already!
for info, posqty in all_dict['pos'].items():
negqty = all_dict['neg'][info] # (defaults to zero)
result[info] = qty * other_qty
# process all the 'pos' psl's. For every match with a 'neg', set
# result[(name, contig)] to the total (posqty * negqty)
请注意,这将丢弃整个解析的psl值,仅保留name
和contig
切片。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.