[英]How to align two lists of numbers
我有兩個排序的數字A
和B
列表,其中B
至少與A
一樣長。 說:
A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
我想將A
中的每個數字與B
的不同數字相關聯,但保留順序。 對於任何此類映射,我們將總距離定義為映射數字之間的平方距離之和。
例如:
如果我們將 1.1 映射到 0 0 那么 2.3 可以映射到從 1.9 開始的任何數字。 但是如果我們已經將 1.1 映射到 2.7,那么 2.3 只能從 8.4 開始映射到 B 中的一個數字。
假設我們映射 1.1->0、2.3->1.9、5.6->8.4、5.7->9.1、10.1->10.7。 這是一個有效的映射並且具有距離 (1.1^2+0.4^2+2.8^2+3.4^2+0.6^2)。
另一個展示貪婪方法的例子是行不通的:
A = [1, 2]
B = [0, 1, 10000]
如果我們映射 1->1,那么我們必須映射 2->10000,這很糟糕。
任務是找到具有最小總距離的有效映射。
很難做到嗎? 當列表長度為幾千時,我對一種快速的方法感興趣。
這是一個O(n)
解決方案! (這是最初的嘗試,有關固定版本,請參見下文。)
思路如下。 我們首先解決每個其他元素的問題,將其轉化為非常接近的解決方案,然后使用動態規划找到真正的解決方案。 這是先解決一個大小為一半的問題,然后是O(n)
工作。 使用x + x/2 + x/4 + ... = 2x
事實證明這是O(n)
工作。
這非常非常需要排序列表。 做一個 5 寬的樂隊是矯枉過正的,看起來 3 寬的樂隊總是能給出正確的答案,但我沒有足夠的信心去做。
def improve_matching (list1, list2, matching):
# We do DP forward, trying a band that is 5 across, building up our
# answer as a linked list. If our answer changed by no more than 1
# anywhere, we are done. Else we recursively improve again.
best_j_last = -1
last = {-1: (0.0, None)}
for i in range(len(list1)):
best_j = None
best_cost = None
this = {}
for delta in (-2, 2, -1, 1, 0):
j = matching[i] + delta
# Bounds sanity checks.
if j < 0:
continue
elif len(list2) <= j:
continue
j_prev = best_j_last
if j <= j_prev:
if j-1 in last:
j_prev = j-1
else:
# Can't push back this far.
continue
cost = last[j_prev][0] + (list1[i] - list2[j])**2
this[j] = (cost, [j, last[j_prev][1]])
if (best_j is None) or cost <= best_cost:
best_j = j
best_cost = cost
best_j_last = best_j
last = this
(final_cost, linked_list) = last[best_j_last]
matching_rev = []
while linked_list is not None:
matching_rev.append( linked_list[0])
linked_list = linked_list[1]
matching_new = [x for x in reversed(matching_rev)]
for i in range(len(matching_new)):
if 1 < abs(matching[i] - matching_new[i]):
print "Improving further" # Does this ever happen?
return improve_matching(list1, list2, matching_new)
return matching_new
def match_lists (list1, list2):
if 0 == len(list1):
return []
elif 1 == len(list1):
best_j = 0
best_cost = (list1[0] - list2[0])**2
for j in range(1, len(list2)):
cost = (list1[0] - list2[j])**2
if cost < best_cost:
best_cost = cost
best_j = j
return [best_j]
elif 1 < len(list1):
# Solve a smaller problem first.
list1_smaller = [list1[2*i] for i in range((len(list1)+1)//2)]
list2_smaller = [list2[2*i] for i in range((len(list2)+1)//2)]
matching_smaller = match_lists(list1_smaller, list2_smaller)
# Start with that matching.
matching = [None] * len(list1)
for i in range(len(matching_smaller)):
matching[2*i] = 2*matching_smaller[i]
# Fill in the holes between
for i in range(len(matching) - 1):
if matching[i] is None:
best_j = matching[i-1] + 1
best_cost = (list1[i] - list2[best_j])**2
for j in range(best_j+1, matching[i+1]):
cost = (list1[i] - list2[j])**2
if cost < best_cost:
best_cost = cost
best_j = j
matching[i] = best_j
# And fill in the last one if needed
if matching[-1] is None:
if matching[-2] + 1 == len(list2):
# This will be an invalid matching, but improve will fix that.
matching[-1] = matching[-2]
else:
best_j = matching[-2] + 1
best_cost = (list1[-2] - list2[best_j])**2
for j in range(best_j+1, len(list2)):
cost = (list1[-1] - list2[j])**2
if cost < best_cost:
best_cost = cost
best_j = j
matching[-1] = best_j
# And now improve.
return improve_matching(list1, list2, matching)
def best_matching (list1, list2):
matching = match_lists(list1, list2)
cost = 0.0
result = []
for i in range(len(matching)):
pair = (list1[i], list2[matching[i]])
result.append(pair)
cost = cost + (pair[0] - pair[1])**2
return (cost, result)
上面有一個錯誤。 它可以用match_lists([1, 3], [0, 0, 0, 0, 0, 1, 3])
來演示。 然而,下面的解決方案也是O(n)
,我相信沒有錯誤。 不同之處在於,我不是尋找固定寬度的帶,而是尋找由先前匹配動態確定的寬度。 由於在任何給定的點上看起來匹配的條目不超過 5 個,因此該數組的O(n)
和幾何遞減的遞歸調用再次結束。 但是相同值的長段不會導致問題。
def match_lists (list1, list2):
prev_matching = []
if 0 == len(list1):
# Trivial match
return prev_matching
elif 1 < len(list1):
# Solve a smaller problem first.
list1_smaller = [list1[2*i] for i in range((len(list1)+1)//2)]
list2_smaller = [list2[2*i] for i in range((len(list2)+1)//2)]
prev_matching = match_lists(list1_smaller, list2_smaller)
best_j_last = -1
last = {-1: (0.0, None)}
for i in range(len(list1)):
lowest_j = 0
highest_j = len(list2) - 1
if 3 < i:
lowest_j = 2 * prev_matching[i//2 - 2]
if i + 4 < len(list1):
highest_j = 2 * prev_matching[i//2 + 2]
if best_j_last == highest_j:
# Have to push it back.
best_j_last = best_j_last - 1
best_cost = last[best_j_last][0] + (list1[i] - list2[highest_j])**2
best_j = highest_j
this = {best_j: (best_cost, [best_j, last[best_j_last][1]])}
# Now try the others.
for j in range(lowest_j, highest_j):
prev_j = best_j_last
if j <= prev_j:
prev_j = j - 1
if prev_j not in last:
continue
else:
cost = last[prev_j][0] + (list1[i] - list2[j])**2
this[j] = (cost, [j, last[prev_j][1]])
if cost < best_cost:
best_cost = cost
best_j = j
last = this
best_j_last = best_j
(final_cost, linked_list) = last[best_j_last]
matching_rev = []
while linked_list is not None:
matching_rev.append( linked_list[0])
linked_list = linked_list[1]
matching_new = [x for x in reversed(matching_rev)]
return matching_new
def best_matching (list1, list2):
matching = match_lists(list1, list2)
cost = 0.0
result = []
for i in range(len(matching)):
pair = (list1[i], list2[matching[i]])
result.append(pair)
cost = cost + (pair[0] - pair[1])**2
return (cost, result)
我被要求解釋為什么這有效。
這是我的啟發式理解。 在算法中,我們解決了半問題。 然后我們必須解決完整的問題。
問題是,完整問題的最佳解決方案與半問題的最佳解決方案相距多遠? 我們通過讓list2
中每個不在半問題中的元素盡可能大,並將list1
中每個不在半問題中的元素盡可能小來將其推到右邊。 但是,如果我們將一半問題中的那些向右推,並將重復元素放在它們然后模邊界效應的位置,我們就得到了一半問題的 2 個最佳解決方案,並且沒有比下一個元素正確的位置移動更多是在一半的問題。 類似的推理適用於試圖強制留下解決方案。
現在讓我們討論這些邊界效應。 那些邊界效應在最后是 1 個元素。 所以當我們試圖把一個元素從最后推下去時,我們不能總是這樣。 通過查看 2 個元素而不是 1 個元素,我們也添加了足夠的擺動空間來解決這個問題。
因此,必須有一個最佳解決方案,它非常接近以明顯方式加倍的半問題。 可能還有其他的,但至少有一個。 而DP步驟會找到它。
我需要做一些工作才能將這種直覺轉化為正式的證明,但我相信它可以做到。
這是一個遞歸解決方案。 挑中間元件a
; 映射,為每個可能的元素b
(假足上的每個端部,以適應的左和右部分a
)。 對於每個這樣的映射,計算單元素成本; 然后在a
和b
每個左右片段上重復。
這是代碼; 我將把記憶作為學生的練習。
test_case = [
[ [1, 2], [0, 1, 10] ],
[ [1.1, 2.3, 5.6, 5.7, 10.1], [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8] ],
]
import math
indent = ""
def best_match(a, b):
"""
Find the best match for elements in a mapping to b, preserving order
"""
global indent
indent += " "
# print(indent, "ENTER", a, b)
best_cost = math.inf
best_map = []
if len(a) == 0:
best_cost = 0
best_map = []
else:
# Match the middle element of `a` to each eligible element of `b`
a_midpt = len(a) // 2
a_elem = a[a_midpt]
l_margin = a_midpt
r_margin = a_midpt + len(b) - len(a)
for b_pos in range(l_margin, r_margin+1):
# For each match ...
b_elem = b[b_pos]
# print(indent, "TRACE", a_elem, b_elem)
# ... compute the element cost ...
mid_cost = (a_elem - b_elem)**2
# ... and recur for similar alignments on left & right list fragments
l_cost, l_map = best_match(a[:l_margin], b[:b_pos])
r_cost, r_map = best_match(a[l_margin+1:], b[b_pos+1:])
# Check total cost against best found; keep the best
cand_cost = l_cost + mid_cost + r_cost
# print(indent, " COST", mid_cost, l_cost, r_cost)
if cand_cost < best_cost:
best_cost = cand_cost
best_map = l_map[:] + [(a_elem, b_elem)]
best_map.extend(r_map[:])
# print(indent, "LEAVE", best_cost, best_map)
return best_cost, best_map
for a, b in test_case:
print('\n', a, b)
print(best_match(a, b))
輸出:
a = [1, 2]
b = [0, 1, 10]
2 [(1, 0), (2, 1)]
a = [1.1, 2.3, 5.6, 5.7, 10.1]
b = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
16.709999999999997 [(1.1, 1.9), (2.3, 2.4), (5.6, 2.7), (5.7, 8.4), (10.1, 10.7)]
對於咯咯笑聲,這是一個比其他任何一個工作都快得多的解決方案。 這個想法很簡單。 首先,我們從左到右進行貪婪匹配。 然后是從右到左的貪婪匹配。 這給了我們每個元素可以去哪里的界限。 然后我們可以從左到右做一個 DP 解決方案,只查看可能的值。
如果貪心方法同意,這將需要線性時間。 如果貪婪方法相距很遠,則這可能需要二次時間。 但希望貪婪的方法產生相當接近的結果,從而產生接近線性的性能。
def match_lists(list1, list2):
# First we try a greedy matching from left to right.
# This gives us, for each element, the last place it could
# be forced to match. (It could match later, for instance
# in a run of equal values in list2.)
match_last = []
j = 0
for i in range(len(list1)):
while True:
if len(list2) - j <= len(list1) - i:
# We ran out of room.
break
elif abs(list2[j+1] - list1[i]) <= abs(list2[j] - list1[i]):
# Take the better value
j = j + 1
else:
break
match_last.append(j)
j = j + 1
# Next we try a greedy matching from right to left.
# This gives us, for each element, the first place it could be
# forced to match.
# We build it in reverse order, then reverse.
match_first_rev = []
j = len(list2) - 1
for i in range(len(list1) - 1, -1, -1):
while True:
if j <= i:
# We ran out of room
break
elif abs(list2[j-1] - list1[i]) <= abs(list2[j] - list1[i]):
# Take the better value
j = j - 1
else:
break
match_first_rev.append(j)
j = j - 1
match_first = [x for x in reversed(match_first_rev)]
# And now we do DP forward, building up our answer as a linked list.
best_j_last = -1
last = {-1: (0.0, None)}
for i in range(len(list1)):
# We initialize with the last position we could choose.
best_j = match_last[i]
best_cost = last[best_j_last][0] + (list1[i] - list2[best_j])**2
this = {best_j: (best_cost, [best_j, last[best_j_last][1]])}
# Now try the rest of the range of possibilities
for j in range(match_first[i], match_last[i]):
j_prev = best_j_last
if j <= j_prev:
j_prev = j - 1 # Push back to the last place we could match
cost = last[j_prev][0] + (list1[i] - list2[j])**2
this[j] = (cost, [j, last[j_prev][1]])
if cost < best_cost:
best_cost = cost
best_j = j
last = this
best_j_last = best_j
(final_cost, linked_list) = last[best_j_last]
matching_rev = []
while linked_list is not None:
matching_rev.append(
(list1[len(matching_rev)], list2[linked_list[0]]))
linked_list = linked_list[1]
matching = [x for x in reversed(matching_rev)]
return (final_cost, matching)
print(match_lists([1.1, 2.3, 5.6, 5.7, 10.1], [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]))
Python 對遞歸不是很友好,因此嘗試將其應用於包含數千個元素的列表可能不太公平。 這是一種自下而上的方法,它利用a
來自A
任何a
的最佳解決方案,因為我們從B
增加了其潛在合作伙伴的指數,而不是減少。 (適用於排序和非排序輸入。)
def f(A, B):
m = [[(float('inf'), -1) for b in B] for a in A]
for i in xrange(len(A)):
for j in xrange(i, len(B) - len(A) + i + 1):
d = (A[i] - B[j]) ** 2
if i == 0:
if j == i:
m[i][j] = (d, j)
elif d < m[i][j-1][0]:
m[i][j] = (d, j)
else:
m[i][j] = m[i][j-1]
# i > 0
else:
candidate = d + m[i-1][j-1][0]
if j == i:
m[i][j] = (candidate, j)
else:
if candidate < m[i][j-1][0]:
m[i][j] = (candidate, j)
else:
m[i][j] = m[i][j-1]
result = m[len(A)-1][len(B)-1][0]
# Backtrack
lst = [None for a in A]
j = len(B) - 1
for i in xrange(len(A)-1, -1, -1):
j = m[i][j][1]
lst[i] = j
j = j - 1
return (result, [(A[i], B[j]) for i, j in enumerate(lst)])
A = [1, 2]
B = [0, 1, 10000]
print f(A, B)
print ""
A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
print f(A, B)
輸出:
(2, [(1, 0), (2, 1)])
(16.709999999999997, [(1.1, 1.9), (2.3, 2.4), (5.6, 2.7), (5.7, 8.4), (10.1, 10.7)])
這是一個O(|B|)
空間實現。 我不確定這是否仍然提供了一種回溯獲取映射的方法,但我正在努力。
def f(A, B):
m = [(float('inf'), -1) for b in B]
m1 = [(float('inf'), -1) for b in B] # m[i-1]
for i in xrange(len(A)):
for j in xrange(i, len(B) - len(A) + i + 1):
d = (A[i] - B[j]) ** 2
if i == 0:
if j == i:
m[j] = (d, j)
elif d < m[j-1][0]:
m[j] = (d, j)
else:
m[j] = m[j-1]
# i > 0
else:
candidate = d + m1[j-1][0]
if j == i:
m[j] = (candidate, j)
else:
if candidate < m[j-1][0]:
m[j] = (candidate, j)
else:
m[j] = m[j-1]
m1 = m
m = m[:len(B) - len(A) + i + 1] + [(float('inf'), -1)] * (len(A) - i - 1)
result = m1[len(B)-1][0]
# Backtrack
# This doesn't work as is
# to get the mapping
lst = [None for a in A]
j = len(B) - 1
for i in xrange(len(A)-1, -1, -1):
j = m1[j][1]
lst[i] = j
j = j - 1
return (result, [(A[i], B[j]) for i, j in enumerate(lst)])
A = [1, 2]
B = [0, 1, 10000]
print f(A, B)
print ""
A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
print f(A, B)
import random
import time
A = [random.uniform(0, 10000.5) for i in xrange(10000)]
B = [random.uniform(0, 10000.5) for i in xrange(15000)]
start = time.time()
print f(A, B)[0]
end = time.time()
print(end - start)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.