繁体   English   中英

改善嵌套循环以提高效率

[英]Improving a nested loop for efficiency

我正在研究有关PSL文件的项目。 该程序总体上看待读对并鉴定环状分子。 我已经使该程序正常工作,但是我的操作嵌套的事实使它效率很低,读取整个PSL文件所花费的时间超过10分钟,而不是大约15秒。

相对代码为:

def readPSLpairs(self):

    posread = []
    negread = []
    result = {}
    for psl in self.readPSL():
        parsed = psl.split()
        strand = parsed[9][-1]
        if strand == '1':
            posread.append(parsed)
        elif strand == '2':
            negread.append(parsed)

    for read in posread:
        posname = read[9][:-2]
        poscontig = read[13]
        for read in negread:
            negname = read[9][:-2]
            negcontig = read[13]
            if posname == negname and poscontig == negcontig:
                try:
                    result[poscontig] += 1
                    break
                except:
                    result[poscontig] = 1
                    break
    print(result)

我尝试更改整体操作,而不是将值附加到列表中,然后尝试匹配posname = negname和poscontig = negcontig,但是事实证明,这比我想象的要难得多,因此我坚持尝试改进这一切的功能。

import collections

all_dict = {"pos": collections.defaultdict(int),
            "neg": collections.defaultdict(int)}

result = {}

for psl in self.readPSL():
    parsed = pls.split()
    strand = "pos" if parsed[9][-1]=='1' else "neg"
    name, contig = parsed[9][:-2], parsed[13]
    all_dict[strand][(name,contig)] += 1
# pre-process all the psl's into all_dict['pos'] or all_dict['neg']
#   this is basically just a `collections.Counter` of what you're doing already!

for info, posqty in all_dict['pos'].items():
    negqty = all_dict['neg'][info]  # (defaults to zero)
    result[info] = qty * other_qty
# process all the 'pos' psl's. For every match with a 'neg', set
#   result[(name, contig)] to the total (posqty * negqty)

请注意,这将丢弃整个解析的psl值,仅保留namecontig切片。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM