改善嵌套循环以提高效率

Question

我正在研究有关PSL文件的项目。 该程序总体上看待读对并鉴定环状分子。 我已经使该程序正常工作，但是我的操作嵌套的事实使它效率很低，读取整个PSL文件所花费的时间超过10分钟，而不是大约15秒。

相对代码为：

def readPSLpairs(self):

    posread = []
    negread = []
    result = {}
    for psl in self.readPSL():
        parsed = psl.split()
        strand = parsed[9][-1]
        if strand == '1':
            posread.append(parsed)
        elif strand == '2':
            negread.append(parsed)

    for read in posread:
        posname = read[9][:-2]
        poscontig = read[13]
        for read in negread:
            negname = read[9][:-2]
            negcontig = read[13]
            if posname == negname and poscontig == negcontig:
                try:
                    result[poscontig] += 1
                    break
                except:
                    result[poscontig] = 1
                    break
    print(result)

我尝试更改整体操作，而不是将值附加到列表中，然后尝试匹配posname = negname和poscontig = negcontig，但是事实证明，这比我想象的要难得多，因此我坚持尝试改进这一切的功能。

Answer 1

import collections

all_dict = {"pos": collections.defaultdict(int),
            "neg": collections.defaultdict(int)}

result = {}

for psl in self.readPSL():
    parsed = pls.split()
    strand = "pos" if parsed[9][-1]=='1' else "neg"
    name, contig = parsed[9][:-2], parsed[13]
    all_dict[strand][(name,contig)] += 1
# pre-process all the psl's into all_dict['pos'] or all_dict['neg']
#   this is basically just a `collections.Counter` of what you're doing already!

for info, posqty in all_dict['pos'].items():
    negqty = all_dict['neg'][info]  # (defaults to zero)
    result[info] = qty * other_qty
# process all the 'pos' psl's. For every match with a 'neg', set
#   result[(name, contig)] to the total (posqty * negqty)

请注意，这将丢弃整个解析的psl值，仅保留name和contig切片。

改善嵌套循环以提高效率

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-11-30 03:15:17

改善嵌套循环以提高效率

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-11-30 03:15:17

解决方案1
1 已采纳 2015-11-30 03:15:17