Python，检查数字是否在列表中的多个范围内。

Question

If there are integer lists like these: 如果有这样的整数列表：

a_list = [2501, 2783, 3088, 3980, 465, 1001, 39392911, 39394382, 488955,489087, ......]
b_list = [474, 498, 47478821, 47479800, 3774, 8970, 484000, 486000......]

every 2 numbers indicate a range of natural numbers, for example, the ranges of a_list would be: 每2个数字表示一个自然数范围，例如a_list的范围为：

2501     2783      # 2501，2502，2503，2504，2505，2506,......,2783
3088     3980 
465      1001 
39392911 39394382 
488955   489087
......

For a given number, search for the range where it belongs to, and with priority of a_list > b_list ie if a range is found in a_list, stop searching and move on to the next number for searching. 对于给定的数字，请搜索其所属的范围，并优先使用a_list> b_list，即，如果在a_list中找到了范围，请停止搜索并继续搜索下一个数字。

I had test run for searching around 50 numbers which took about 7 minutes. 我进行了搜索，搜索了大约50分钟的数字，这大约花费了7分钟。 I have a big dataset which could be 20 million numbers need to be searched in his way. 我有一个很大的数据集，可能需要用他的方式搜索2000万个数字。

How to code this to do it faster? 如何编写代码以使其更快地执行？

============= more conditions and information ============= =============更多条件和信息=============

could be more than 10 thousand numbers in each list. 每个列表中的数字可能超过一万。
could be up to 30 million numbers for searching. 最多可以搜索3000万个数字。
the size of list is always n * 2 列表的大小始终为n * 2
a_list: [1st < 2nd, 3rd < 4th, ......] a_list：[1st <2nd，3rd <4th，......]
the numbers in the lists might occur more than once. 列表中的数字可能会出现多次。
the priority: a_list > b_list. 优先级：a_list> b_list。

I have code as following: 我有如下代码：

hasFound = 0

if hasFound == 0:
    for x, y in izip(*[iter(a_list)]*2):   # gives every 2 numbers
        if aNumber in range(x,y):
            a_list_counter +=1 
            hasFound = 1
            break

if hasFound == 0:       
    for x, y in izip(*[iter(b_list)]*2):
        if aNumber in range(x,y):
            b_list_counter += 1
            hasFound = 1
            break

Many thanks in advance. 提前谢谢了。

Answer 1

Toss them all in one big dictionary: 将它们全部放入一本大词典中：

a_list = [2501, 2783, 3088, 3980, 465, 1001, 39392911, 39394382, 488955,489087, ......]
b_list = [474, 498, 47478821, 47479800, 3774, 8970, 484000, 486000......]
# into
ranges = {'a': [2501, 2783, 3088, 3980, 465, 1001, 39392911, 39394382, 488955,489087, ......],
          'b': [474, 498, 47478821, 47479800, 3774, 8970, 484000, 486000......]}

Then go through each list in order, mostly the way you were doing it before: 然后按顺序浏览每个列表，主要是您之前的操作方式：

numbers = [list of your target numbers]
scores = {} # dict to store results in

for number in numbers:

    for range_name in sorted(ranges):
        range_list = ranges[range_name]
        groups = zip(*[iter(range_list)] * 2)
        if any(start <= number < end for start,end in groups):
            scores.setdefault(range_name, 0) += 1

Alternatively (and I'm not sure if this is faster or not) you could do: 或者（您不确定这样做是否更快），您可以执行以下操作：

for number in numbers:
    for range_name in sorted(ranges):
        range = ranges[range_name]
        if sorted(range + [number]).index(number) % 2:
            scores.setdefault(range, 0) += 1

In this case you're throwing a new number into a sorted list, re-sorting it (which is fast using TimSort), and seeing if it falls between two existing numbers. 在这种情况下，您要将一个新数字放入一个已排序的列表中，对其进行重新排序（使用TimSort可以快速进行排序），然后查看它是否介于两个现有数字之间。

Python，检查数字是否在列表中的多个范围内。

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-02-16 20:36:07

Python，检查数字是否在列表中的多个范围内。

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-02-16 20:36:07

解决方案1
1 已采纳 2015-02-16 20:36:07