简体   繁体   English

嵌套循环的替代方法

[英]Alternative to nested for loop

I am working on a bash script for comparing several positions with given start position/end positions. 我正在研究一个bash脚本,用于比较多个位置与给定的开始位置/结束位置。 I have two different files (with different size): 我有两个不同的文件(大小不同):

  • File 1: start and end position (tab seperated) 文件1:开始和结束位置(制表符分开)
  • File 2: single position 档案2:单一位置

Bash is really slow while processing for loops and I had the idea of using python for this approach. Bash在处理循环时真的很慢,我想到了将python用于这种方法的想法。

python - << EOF


posList=posString.split()
endList=endString.split()
startList=startString.split()

for j, val2  in enumerate(posList):
        for i, val1 in enumerate(startList):
                if val1 >= val2 and endList[i] <= val2:
                        print "true", val2
                else:
                        print "false", val2

EOF

I have three strings as input (position, start, end) and split them into lists . 我有三个字符串作为输入 (position,start,end),并将它们分成list With the two nested loops I iterate over the bigger position file and then over the star/end file. 通过两个嵌套循环,我遍历了较大的位置文件,然后遍历了星形/末端文件。 If my conditions are fullfilled ( if pos > start and position < end ) I would like to print something. 如果我的条件已满( 如果pos> start和position <end ),我想打印一些内容。

My input files are string, whitespace seperated with numbers. 我的输入文件是字符串,用数字分隔的空格。

Maybe I'm absolutly on the wrong way, I hope not, but with this idea it takes too long to work with it. 也许我绝对会走错路,但我希望不会,但是有了这个主意,花太长时间了。

Thanks a lot for your help. 非常感谢你的帮助。

If you start by sorting the positions and the ranges, you can save a lot of time: 如果从排序位置和范围开始,可以节省很多时间:

range_sorted_list = sorted(zip(start_list, end_list))
range_sorted_iter = iter(range_sorted_list)
pos_sorted_list = sorted(pos_list)

start, end = next(range_sorted_iter)

try:        
    for pos in pos_sorted_list:
        while pos >= end:
            start, end = next(range_sorted_iter)
        if start <= pos < end:
            print "True", pos
        elif pos < start:
            print "False", pos
except StopIteration:
    pass

This will allow you to only go over the arrays once, instead of once for every position. 这将使您只能遍历数组一次,而不是每个位置都遍历一次。

Itertools is the way to go. Itertools是必经之路。 The product function uses vector operations to make the execution more efficient. 乘积函数使用向量运算来提高执行效率。 itertools itertools

from itertools import product

posList=posString.split()
endList=endString.split()
startList=startString.split()

for (j, val2),(i,val1) in product(enumerate(posList),enumerate(startList)):
       if val1 >= val2 and endList[i] <= val2:
                print "true", val2
       else:
                print "false", val2,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM