简体   繁体   English

给定 3 个列表,找出前两个列表中哪两个元素之和尽可能接近第三个列表中的每个值

[英]Given 3 lists, find which two elements in the first two lists sum as close as possible to each value in the third list

I am given 3 lists, and I have to find two values in the first two lists whose sum is as close as possible to each value in the third list, and I have to return their indices (one-based indexing).我得到了 3 个列表,我必须在前两个列表中找到两个值,它们的总和尽可能接近第三个列表中的每个值,并且我必须返回它们的索引(基于一个的索引)。 If multiple solutions are equally close, either one may be returned.如果多个解决方案同样接近,则可以返回其中一个。

I have a working solution, and while it worked on semi-large inputs, it is too slow for large inputs (all 3 lists length 10000 for example).我有一个可行的解决方案,虽然它适用于半大输入,但对于大输入来说太慢了(例如,所有 3 个列表的长度都是 10000)。

So the question is basically: how can you find an exact solution to this problem, without having to calculate every possible combination of list1 and list2?所以问题基本上是:如何找到这个问题的确切解决方案,而不必计算 list1 和 list2 的所有可能组合?

Sample input:样本输入:

3
2 2 5
1.000002 0.000002
0.500000 -0.500000
0.500001 0.500002 0.500003 1.000000 0.000001
2 2 5
1.000002 0.000001
0.500000 -0.500000
0.500001 0.500002 0.500003 1.000000 0.000001
5 4 7
0.000001 0.000002 0.000003 0.000004 0.000005
0.000002 0.000010 0.000001 -0.000001
0.000001 0.000002 0.000100 0.000005 0.000020 0.000010 0.000003

Sample output (added newlines for readability, so not present in script output):示例输出(添加换行符以提高可读性,因此不会出现在脚本输出中):

2 1
2 1
2 1
2 1
2 2

2 1
1 2
1 2
1 2
2 2

2 4
3 4
5 2
4 3
5 2
1 2
4 4

My current solution:我目前的解决方案:

"""Given an input file, which contains multiple sets of 3 lists each,
find two values in list1 and list2 whose sum is as close as possible to each
element in list3"""

from sys import argv
from time import time
start = time()

def parser(file):
    """Reads a file, returns it as a list of reads, where each read contains
    an info line, list1, list2, list3"""
    lines = open(argv[1], 'r').readlines()
    read = []
    tests = int(lines.pop(0))
    for line in lines:
        read.append(line.strip())
    reads = []
    for n in range(tests):
        reads.append(read[4 * n:4*(n+1)])
    return reads

def dict_of_sums(list1, list2):
    """Creates a dict, whose keys are the sums of all values in list1 and
    list2, and whose values are the indices of those values in list1 and
    list2"""
    sums = {}
    m = len(list1)
    k = len(list2)
    for a in range(m):
        for b in range(k):
            combination = str(a + 1) + ' ' + str(b + 1)
            sum = float(list1[a]) + float(list2[b])
            sum = round(sum, 6)
            sums[sum] = combination
    return sums

def find_best_combination(ordered, list3, c):
    """Finds the best combination using binary search: takes a number c,
    and searches through the ordered list to find the closest sum.
    Returns that sum"""
    num = float(list3[c])
    lower, upper = 0, len(ordered)
    while True:
        idx = (lower + upper) // 2
        value = ordered[idx]

        if value == num:
            return value
        if value > num:
            upper = idx
        elif value < num:
            lower = idx

        if lower + 1 == upper:
            for z in [-1, 0, 1]:
                totest = idx + z
                if z == -1:
                    delta = (ordered[totest] - num) ** 2
                    best = totest
                else:
                    deltanew = (ordered[totest] - num) ** 2
                    if deltanew < delta:
                        delta = deltanew
                        best = totest

            return ordered[best]

reads = parser(argv[1])
for i in reads:
    m, k, n = i.pop(0).split()
    m, k, n = int(m), int(k), int(n)
    list1, list2, list3 = i[0].split(), i[1].split(), i[2].split()
    results = dict_of_sums(list1, list2)
    ordered = []
    # Create an ordered list of all possible sums of the values in list1 and
    # list2
    for k in results.keys():
        ordered.append(k)
    ordered = sorted(ordered)
    # Loops over list3, searching the closest sum. Prints the indices of its
    # constituent numbers in list1 and list2
    for c in range(n):
        res = find_best_combination(ordered, list3, c)
        results[res]

end = time()
print(end - start)

Your current solution is O(n^2 log(n)) time, and O(n^2) memory.您当前的解决方案是O(n^2 log(n))时间和O(n^2)内存。 The reason is that your ordered is a list of size n^2 that you then sort, and do lots and lots of binary searches on.原因是您ordered是一个大小为n^2的列表,然后您对其进行排序,并对其进行大量的二分搜索。 This gives you much poorer constants, and a chance of going into swap.这会给你带来更糟糕的常数,并有机会进入交换。

In you case of 10,000 each, you have a dictionary with 100,000,000 keys, that you then sort, and walk through.在每个 10,000 个的情况下,您有一个包含 100,000,000 个键的字典,然后您可以对其进行排序和遍历。 Which is billions of operations and GB of data.这是数十亿次操作和 GB 数据。 If your machine winds up in swap, those operations will slow down a lot and you have a problem.如果您的机器在交换中结束,那么这些操作会减慢很多并且您会遇到问题。

I would suggest that you sort lists 2 and 3. For each l1 in list 1 it lets you walk through l1+l2 in parallel with walking through l3 , finding the best in l3.我建议您对列表 2 和 3 进行排序。对于列表 1 中的每个l1 ,它可以让您通过l1+l2与通过l3并行,找到l3的最佳值。 Here that is in pseudo-code:这是伪代码:

record best for every element in list 3 to be list1[1] + list2[1]
foreach l1 in list 1:
    start l2 at start of list2
    start l3 at start of list3
    while can advance in both list 2 and list 3:
        if advancing in list2 improves l1 + l2 as approx of l3:
            advance in list 2
        else:
            if l1 + l2 is better than best recorded approx of l3:
                record l1 + l2 as best for l3
            advance in list 3
    while can advance in list 3:
        if l1 + l2 is better than best recorded approx of l3:
            record l1 + l2 as best for l3
        advance in list 3
    if l1 + l2 is better than best recorded approx of l3:
        record l1 + l2 as best for l3

This requires sorted versions of list2 and list3, and a lookup from list3 to best approximation.这需要 list2 和 list3 的排序版本,以及从 list3 到最佳近似的查找。 In your example of 10,000 items each, you have 2 data structures of size 10,000, and have to do roughly 200,000,000 operations.在每个 10,000 个项目的示例中,您有 2 个大小为 10,000 的数据结构,并且必须执行大约 200,000,000 次操作。 Better than billions and no problems with pressing against memory limits.超过数十亿,并且在内存限制方面没有问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 返回给定第三个列表的两个列表的索引列表 - Return list of index of two lists given a third list 删除列表中元素的前两个字符 - Delete first two characters of elements in a list of lists 根据定义操作的第三个列表,对两个列表的Sum或平均OR max元素求和 - Sum OR average OR max elements of two lists according to a third list defining the operation 对于第三个列表中给定数量的元素,返回两个列表之间的字符串匹配 - Returning string matches between two lists for a given number of elements in a third list 给定两个字符串列表,找出第二个列表中包含第一个列表中的任何字符串作为子字符串的字符串总数 - Given two lists of strings, find the total number of strings in the second list which contains any string in the first list as substring 将两个列表放入第三个列表 - Putting two lists into a third list 从两个列表中获取每个第一个、第二个、第三个元素 - Taking every first, second, third elements from two lists 在包含两个特定元素的列表中查找列表 - Find lists in a list that contains two specific elements 在两个列表列表中找到非公共元素 - find non common elements in two list of lists 从第三个列表中找到两个列表中最常见的数字,获取最常见元素的列表 - Find the most common number in two lists from a third list, get list with most common elements
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM