嵌套循环的替代方法

Question

I am working on a bash script for comparing several positions with given start position/end positions. 我正在研究一个bash脚本，用于比较多个位置与给定的开始位置/结束位置。 I have two different files (with different size): 我有两个不同的文件（大小不同）：

File 1: start and end position (tab seperated) 文件1：开始和结束位置（制表符分开）
File 2: single position 档案2：单一位置

Bash is really slow while processing for loops and I had the idea of using python for this approach. Bash在处理循环时真的很慢，我想到了将python用于这种方法的想法。

python - << EOF


posList=posString.split()
endList=endString.split()
startList=startString.split()

for j, val2  in enumerate(posList):
        for i, val1 in enumerate(startList):
                if val1 >= val2 and endList[i] <= val2:
                        print "true", val2
                else:
                        print "false", val2

EOF

I have three strings as input (position, start, end) and split them into lists . 我有三个字符串作为输入（position，start，end），并将它们分成list 。 With the two nested loops I iterate over the bigger position file and then over the star/end file. 通过两个嵌套循环，我遍历了较大的位置文件，然后遍历了星形/末端文件。 If my conditions are fullfilled ( if pos > start and position < end ) I would like to print something. 如果我的条件已满（ 如果pos> start和position <end ），我想打印一些内容。

My input files are string, whitespace seperated with numbers. 我的输入文件是字符串，用数字分隔的空格。

Maybe I'm absolutly on the wrong way, I hope not, but with this idea it takes too long to work with it. 也许我绝对会走错路，但我希望不会，但是有了这个主意，花太长时间了。

Thanks a lot for your help. 非常感谢你的帮助。

Answer 1

If you start by sorting the positions and the ranges, you can save a lot of time: 如果从排序位置和范围开始，可以节省很多时间：

range_sorted_list = sorted(zip(start_list, end_list))
range_sorted_iter = iter(range_sorted_list)
pos_sorted_list = sorted(pos_list)

start, end = next(range_sorted_iter)

try:        
    for pos in pos_sorted_list:
        while pos >= end:
            start, end = next(range_sorted_iter)
        if start <= pos < end:
            print "True", pos
        elif pos < start:
            print "False", pos
except StopIteration:
    pass

This will allow you to only go over the arrays once, instead of once for every position. 这将使您只能遍历数组一次，而不是每个位置都遍历一次。

Answer 2

Itertools is the way to go. Itertools是必经之路。 The product function uses vector operations to make the execution more efficient. 乘积函数使用向量运算来提高执行效率。 itertools itertools

from itertools import product

posList=posString.split()
endList=endString.split()
startList=startString.split()

for (j, val2),(i,val1) in product(enumerate(posList),enumerate(startList)):
       if val1 >= val2 and endList[i] <= val2:
                print "true", val2
       else:
                print "false", val2,

嵌套循环的替代方法

问题描述

2 个解决方案

解决方案1
0 已采纳 2014-05-09 11:15:37

解决方案2
0 2018-03-12 08:16:28

嵌套循环的替代方法

问题描述

2 个解决方案

解决方案1 0 已采纳 2014-05-09 11:15:37

解决方案2 0 2018-03-12 08:16:28

解决方案1
0 已采纳 2014-05-09 11:15:37

解决方案2
0 2018-03-12 08:16:28