简体   繁体   中英

The fastest way to find 2 numbers from two lists that in sum equal to x

My code:

n = 3
a1 = 0
b1 = 10
a2 = 2
b2 = 2

if b1>n:
    b1=n
if b2>n:
    b2=n

diap1 = [x for x in range(a1, b1+1)]
diap2 = [x for x in range(a2, b2+1)]

def pairs(d1, d2, n):
    res = 0
    same = 0
    sl1 = sorted(d1)
    sl2 = sorted(d2)
    for i in sl1:
        for j in sl2:
            if i+j==n and i!=j:
                res+=1
            elif i+j==n and i==j:
                same+=1
    return(res+same)

result = pairs(diap1, diap2, n)
print(result)

NOTE: n, a1, b1, a2, b2 can change . The code should find 2 numbers from 2 lists(1 from each) that in sum equal to n. For example: pairs (a, b) and (b, a) are different but (a, a) and (a, a) is the same pair . So, output of my code is correct and for the code above it's 1(1, 2) but for big inputs it takes too much time. How can I optimize it to work faster ?

Use set() for fast lookup...

setd2 = set(d2)

Don't try all possible number pairs. Once you fix on a number from the first list, say i, just see if (ni) is in the second set.

for i in sl1:
    if (n-i) in setd2:
        # found match
    else:
        # no match in setd2 for i

The following way you can work the fastest and find the two numbers whose sum is equal to n and store them as well in a list of tuples.

s1 = set(list1)
s2 = set(list2)
nums = []
for item in s1:
    if n-item in s2:
       nums.append((item, n-item))

The accepted answer is really easy to understand and implement but i just had to share this method. You can see your question is the same as this one .
This answer in particular is interesting because you do not need extra space by inserting into the sets. I'm including the algorithm here in my answer.

If the arrays are sorted you can do it in linear time and constant storage.

  • Start with two pointers, one pointing at the smallest element of A, the other pointing to the largest element of B.
  • Calculate the sum of the pointed to elements.
  • If it is smaller than k increment the pointer into A so that it points to the next largest element.
  • If it is larger than k decrement the pointer into B so that it points to the next smallest element.
  • If it is exactly k you've found a pair. Move one of the pointers and keep going to find the next pair.

If the arrays are initially unsorted then you can first sort them then use the above algorithm.

Thank you for clearly defining your question and for providing your code example that you are attempting to optimize.

Utilizing two key definitions from your question and the notation you provided, I limited my optimization attempt to the use of lists, and added the ability to randomly change the values associated to n, a1, b1, a2 and b2.

In order to show the optimization results, I created a module which includes the use of the random.randit function to create a variety of list sizes and the timeit.Timer function to capture the amount of time your original pairs() function takes as well as my suggested optimization in the the pairs2() function.

In the pairs2() function, you will note that each iteration loop contains a break statement. These eliminate needless iteration through each list once the desired criteria is met. You should note that as the size of the lists grow, the pairs2() vs. pairs() time improves.

Test module code:

import random
from timeit import Timer

max_value = 10000
n =  random.randint(1, max_value)
a1 = random.randint(0, max_value)
b1 = random.randint(1, max_value+1)
a2 = random.randint(0, max_value)
b2 = random.randint(1, max_value+1)

if b1>n:
    b1=n
if b2>n:
    b2=n

if a1>=b1:
    a1 = random.randint(0, b1-1)
if a2>=b2:
    a2 = random.randint(0, b2-1)

diap1 = [x for x in range(a1, b1)]
diap2 = [x for x in range(a2, b2)]
print("Length diap1 =", len(diap1))
print("Length diap2 =", len(diap2))

def pairs(d1, d2, n): 
    res = 0 
    same = 0    
    sl1 = sorted(d1)
    sl2 = sorted(d2)
    for i in sl1:
        for j in sl2:
            if i+j==n and i!=j:                 
                res+=1                                          
            elif i+j==n and i==j:
                same+=1
    return(res+same)

def pairs2(d1, d2, n): 
    res = 0 
    same = 0    
    sl1 = sorted(d1)
    sl2 = sorted(d2)
    for i in sl1:
        for j in sl2:
            if i+j==n and i!=j:                 
                res+=1
                break                                      
            elif i+j==n and i==j:
                same+=1
                break
        if res+same>0:
            break
    return(res+same)

if __name__ == "__main__":
    result=0
    timer = Timer("result = pairs(diap1, diap2, n)",
                  "from __main__ import diap1, diap2, n, pairs")
    print("pairs_time = ", timer.timeit(number=1), "result =", result)

    result=0
    timer = Timer("result = pairs2(diap1, diap2, n)",
              "from __main__ import diap1, diap2, n, pairs2")
    print("pairs2_time = ", timer.timeit(number=1), "result =", result)

If you pull the value n from the first list and then search for a value m in the second list so that the sum matches the searched value, you can make a few shortcuts. For example, if the sum is less, all values from the second list that are smaller than or equal to m will also not give the right sum. Similarly, if the sum is larger.

Using this info, I'd use following steps:

  • Set up two heaps, one minimum heap, one maximum heap.
  • Look at the top elements of each heap:
    • If the sum matches the searched value, you are done.
    • If the sum exceeds the searched value, remove the value from the maximum heap.
    • If the sum is less than the searched value, remove the value from the minimum heap.
  • If either heap is empty, there is no solution.

Note that using a heap is an optimization over rightaway sorting the two sequences. However, if you often have the case that there is no match, sorting the numbers before the algorithm might be a faster approach. The reason for that is that a good sorting algorithm will outperform the implicit sorting through the heaps, not by its asymptotic complexity but rather by some constant factors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM