[英]Given 3 lists, find which two elements in the first two lists sum as close as possible to each value in the third list
I am given 3 lists, and I have to find two values in the first two lists whose sum is as close as possible to each value in the third list, and I have to return their indices (one-based indexing).我得到了 3 个列表,我必须在前两个列表中找到两个值,它们的总和尽可能接近第三个列表中的每个值,并且我必须返回它们的索引(基于一个的索引)。 If multiple solutions are equally close, either one may be returned.如果多个解决方案同样接近,则可以返回其中一个。
I have a working solution, and while it worked on semi-large inputs, it is too slow for large inputs (all 3 lists length 10000 for example).我有一个可行的解决方案,虽然它适用于半大输入,但对于大输入来说太慢了(例如,所有 3 个列表的长度都是 10000)。
So the question is basically: how can you find an exact solution to this problem, without having to calculate every possible combination of list1 and list2?所以问题基本上是:如何找到这个问题的确切解决方案,而不必计算 list1 和 list2 的所有可能组合?
Sample input:样本输入:
3
2 2 5
1.000002 0.000002
0.500000 -0.500000
0.500001 0.500002 0.500003 1.000000 0.000001
2 2 5
1.000002 0.000001
0.500000 -0.500000
0.500001 0.500002 0.500003 1.000000 0.000001
5 4 7
0.000001 0.000002 0.000003 0.000004 0.000005
0.000002 0.000010 0.000001 -0.000001
0.000001 0.000002 0.000100 0.000005 0.000020 0.000010 0.000003
Sample output (added newlines for readability, so not present in script output):示例输出(添加换行符以提高可读性,因此不会出现在脚本输出中):
2 1
2 1
2 1
2 1
2 2
2 1
1 2
1 2
1 2
2 2
2 4
3 4
5 2
4 3
5 2
1 2
4 4
My current solution:我目前的解决方案:
"""Given an input file, which contains multiple sets of 3 lists each,
find two values in list1 and list2 whose sum is as close as possible to each
element in list3"""
from sys import argv
from time import time
start = time()
def parser(file):
"""Reads a file, returns it as a list of reads, where each read contains
an info line, list1, list2, list3"""
lines = open(argv[1], 'r').readlines()
read = []
tests = int(lines.pop(0))
for line in lines:
read.append(line.strip())
reads = []
for n in range(tests):
reads.append(read[4 * n:4*(n+1)])
return reads
def dict_of_sums(list1, list2):
"""Creates a dict, whose keys are the sums of all values in list1 and
list2, and whose values are the indices of those values in list1 and
list2"""
sums = {}
m = len(list1)
k = len(list2)
for a in range(m):
for b in range(k):
combination = str(a + 1) + ' ' + str(b + 1)
sum = float(list1[a]) + float(list2[b])
sum = round(sum, 6)
sums[sum] = combination
return sums
def find_best_combination(ordered, list3, c):
"""Finds the best combination using binary search: takes a number c,
and searches through the ordered list to find the closest sum.
Returns that sum"""
num = float(list3[c])
lower, upper = 0, len(ordered)
while True:
idx = (lower + upper) // 2
value = ordered[idx]
if value == num:
return value
if value > num:
upper = idx
elif value < num:
lower = idx
if lower + 1 == upper:
for z in [-1, 0, 1]:
totest = idx + z
if z == -1:
delta = (ordered[totest] - num) ** 2
best = totest
else:
deltanew = (ordered[totest] - num) ** 2
if deltanew < delta:
delta = deltanew
best = totest
return ordered[best]
reads = parser(argv[1])
for i in reads:
m, k, n = i.pop(0).split()
m, k, n = int(m), int(k), int(n)
list1, list2, list3 = i[0].split(), i[1].split(), i[2].split()
results = dict_of_sums(list1, list2)
ordered = []
# Create an ordered list of all possible sums of the values in list1 and
# list2
for k in results.keys():
ordered.append(k)
ordered = sorted(ordered)
# Loops over list3, searching the closest sum. Prints the indices of its
# constituent numbers in list1 and list2
for c in range(n):
res = find_best_combination(ordered, list3, c)
results[res]
end = time()
print(end - start)
Your current solution is O(n^2 log(n))
time, and O(n^2)
memory.您当前的解决方案是O(n^2 log(n))
时间和O(n^2)
内存。 The reason is that your ordered
is a list of size n^2
that you then sort, and do lots and lots of binary searches on.原因是您ordered
是一个大小为n^2
的列表,然后您对其进行排序,并对其进行大量的二分搜索。 This gives you much poorer constants, and a chance of going into swap.这会给你带来更糟糕的常数,并有机会进入交换。
In you case of 10,000 each, you have a dictionary with 100,000,000 keys, that you then sort, and walk through.在每个 10,000 个的情况下,您有一个包含 100,000,000 个键的字典,然后您可以对其进行排序和遍历。 Which is billions of operations and GB of data.这是数十亿次操作和 GB 数据。 If your machine winds up in swap, those operations will slow down a lot and you have a problem.如果您的机器在交换中结束,那么这些操作会减慢很多并且您会遇到问题。
I would suggest that you sort lists 2 and 3. For each l1
in list 1 it lets you walk through l1+l2
in parallel with walking through l3
, finding the best in l3.我建议您对列表 2 和 3 进行排序。对于列表 1 中的每个l1
,它可以让您通过l1+l2
与通过l3
并行,找到l3
的最佳值。 Here that is in pseudo-code:这是伪代码:
record best for every element in list 3 to be list1[1] + list2[1]
foreach l1 in list 1:
start l2 at start of list2
start l3 at start of list3
while can advance in both list 2 and list 3:
if advancing in list2 improves l1 + l2 as approx of l3:
advance in list 2
else:
if l1 + l2 is better than best recorded approx of l3:
record l1 + l2 as best for l3
advance in list 3
while can advance in list 3:
if l1 + l2 is better than best recorded approx of l3:
record l1 + l2 as best for l3
advance in list 3
if l1 + l2 is better than best recorded approx of l3:
record l1 + l2 as best for l3
This requires sorted versions of list2 and list3, and a lookup from list3 to best approximation.这需要 list2 和 list3 的排序版本,以及从 list3 到最佳近似的查找。 In your example of 10,000 items each, you have 2 data structures of size 10,000, and have to do roughly 200,000,000 operations.在每个 10,000 个项目的示例中,您有 2 个大小为 10,000 的数据结构,并且必须执行大约 200,000,000 次操作。 Better than billions and no problems with pressing against memory limits.超过数十亿,并且在内存限制方面没有问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.