简体   繁体   中英

Get the number of integers between each given range

The task:

Write a function that receives 3 arrays and returns an array. The first array contains n integers, their values range between 0 and 10^9. "numbers". The second array is a low-range array, which contains the lower end of a range, it contains q integers. "low". The third array is a high-range array, which contains the higher end of a range, it contains q integers. "high".

The function should return an array that contains the number of integers in the first array, that fall in its range, given by the low-range and high-range arrays.

In the returned array, at index i, there should be the number of integers in "numbers" which are bigger or equal to low[i] and smaller or equal to high[i].

Examples:
count_range([12,13,14,15,17],[14],[14]) should return [1]
count_range([12,13,14,15,17],[14,15],[14,18]) should return [1,2]
count_range([12,13,14,15,17],[12],[17]) should return [5]

This is how I solved the question but I feel like there might be a more efficient way to solve seeing as the arrays could be long and the numbers could be really big.

I'd be glad to get some insights or tests that challenge this could to help me think in a better direction.

def binarySearch(data, val):
    highIndex = len(data) - 1
    lowIndex = 0
    while highIndex > lowIndex:
        index = math.ceil((highIndex + lowIndex) / 2)
        sub = data[index]
        if sub > val:
            if highIndex == index:
                return sorted([highIndex, lowIndex])
            highIndex = index
        else:
            if lowIndex == index:
                return sorted([highIndex, lowIndex])
            lowIndex = index
    return sorted([highIndex, lowIndex])


def count_range(numbers, low, high):
    numbers.sort()
    result = []
    range_dict = {}
    for i in range(len(numbers)):
        if numbers[i] not in range_dict:
            range_dict[numbers[i]] = i
    for i in range(len(low)):
        low_r = low[i]
        high_r = high[i]
        if low_r not in range_dict:
            range_dict[low_r] = binarySearch(numbers, low_r)[0]
        low_index = range_dict.get(low_r)
        if high_r not in range_dict:
            range_dict[high_r] = binarySearch(numbers, high_r)[0]
        high_index = range_dict.get(high_r)
        while high_index+1 < len(numbers) and numbers[high_index + 1] == numbers[high_index]:
            high_index += 1
        if low_r in numbers or low_r < numbers[0]:
            low_index -= 1
        result.append(high_index - low_index)
    print(result)
    return result

Using Python's bisect :

from bisect import bisect_left, bisect_right

def count_range(numbers, low, high):
    numbers.sort()
    return [bisect_right(numbers, hi) - bisect_left(numbers, lo)
            for hi, lo in zip(high, low)]

In my solution, the sorting actually took about 90% of my time, then handling the queries only took about 10% of my time.

Benchmark with 100,000 numbers and 1000 ranges ( Try it online! ):

  71.6 ms  count_range_Sara
  24.1 ms  count_range_Kelly
6303.6 ms  count_range_Nin17

  70.5 ms  count_range_Sara
  23.1 ms  count_range_Kelly
6225.0 ms  count_range_Nin17

  64.5 ms  count_range_Sara
  23.1 ms  count_range_Kelly
6306.7 ms  count_range_Nin17

(Note: Sara's solution I measured is the original slightly buggy one, which I included despite the buggyness as the question is about efficiency. And Nin17's solution that I measured is their original one from their now deleted answer, not the NumPy one.)

With numpy:

import numpy as np
def count_range(numbers, low, high):
    numbers.sort()
    return np.searchsorted(numbers, high, 'right') - np.searchsorted(numbers, low, 'left')

Benchmark:

from timeit import timeit
def test_data():
    RANGE = range(10**9)
    numbers = random.choices(RANGE, k=10**5)
    ranges = (sorted(random.sample(RANGE, 2))
              for _ in range(1000))
    [*low], [*high] = zip(*ranges)
    return [numbers, low, high]
test = test_data()
%timeit kelly(*[t.copy() for t in test])
test2 = [np.array(i) for i in test]
%timeit Nin17(*[t.copy() for t in test2])

Output:

13.4 ms ± 77.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.63 ms ± 14 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM