簡體   English   中英

如何對齊兩個數字列表

[英]How to align two lists of numbers

我有兩個排序的數字AB列表,其中B至少與A一樣長。 說:

A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]

我想將A中的每個數字與B的不同數字相關聯,但保留順序。 對於任何此類映射,我們將總距離定義為映射數字之間的平方距離之和。

例如:

如果我們將 1.1 映射到 0 0 那么 2.3 可以映射到從 1.9 開始的任何數字。 但是如果我們已經將 1.1 映射到 2.7,那么 2.3 只能從 8.4 開始映射到 B 中的一個數字。

假設我們映射 1.1->0、2.3->1.9、5.6->8.4、5.7->9.1、10.1->10.7。 這是一個有效的映射並且具有距離 (1.1^2+0.4^2+2.8^2+3.4^2+0.6^2)。

另一個展示貪婪方法的例子是行不通的:

 A = [1, 2]
 B = [0, 1, 10000]

如果我們映射 1->1,那么我們必須映射 2->10000,這很糟糕。

任務是找到具有最小總距離的有效映射。

很難做到嗎? 當列表長度為幾千時,我對一種快速的方法感興趣。

這是一個O(n)解決方案! (這是最初的嘗試,有關固定版本,請參見下文。)

思路如下。 我們首先解決每個其他元素的問題,將其轉化為非常接近的解決方案,然后使用動態規划找到真正的解決方案。 這是先解決一個大小為一半的問題,然后是O(n)工作。 使用x + x/2 + x/4 + ... = 2x事實證明這是O(n)工作。

這非常非常需要排序列表。 做一個 5 寬的樂隊是矯枉過正的,看起來 3 寬的樂隊總是能給出正確的答案,但我沒有足夠的信心去做。

def improve_matching (list1, list2, matching):
    # We do DP forward, trying a band that is 5 across, building up our
    # answer as a linked list.  If our answer changed by no more than 1
    # anywhere, we are done.  Else we recursively improve again.
    best_j_last = -1
    last = {-1: (0.0, None)}
    for i in range(len(list1)):
        best_j = None
        best_cost = None
        this = {}
        for delta in (-2, 2, -1, 1, 0):
            j = matching[i] + delta
            # Bounds sanity checks.
            if j < 0:
                continue
            elif len(list2) <= j:
                continue

            j_prev = best_j_last
            if j <= j_prev:
                if j-1 in last:
                    j_prev = j-1
                else:
                    # Can't push back this far.
                    continue

            cost = last[j_prev][0] + (list1[i] - list2[j])**2
            this[j] = (cost, [j, last[j_prev][1]])
            if (best_j is None) or cost <= best_cost:
                best_j = j
                best_cost = cost

        best_j_last = best_j
        last = this

    (final_cost, linked_list) = last[best_j_last]
    matching_rev = []
    while linked_list is not None:
        matching_rev.append( linked_list[0])
        linked_list = linked_list[1]
    matching_new = [x for x in reversed(matching_rev)]
    for i in range(len(matching_new)):
        if 1 < abs(matching[i] - matching_new[i]):
            print "Improving further" # Does this ever happen?
            return improve_matching(list1, list2, matching_new)

    return matching_new

def match_lists (list1, list2):
    if 0 == len(list1):
        return []
    elif 1 == len(list1):
        best_j = 0
        best_cost = (list1[0] - list2[0])**2
        for j in range(1, len(list2)):
            cost = (list1[0] - list2[j])**2
            if cost < best_cost:
                best_cost = cost
                best_j = j
        return [best_j]
    elif 1 < len(list1):
        # Solve a smaller problem first.
        list1_smaller = [list1[2*i] for i in range((len(list1)+1)//2)]
        list2_smaller = [list2[2*i] for i in range((len(list2)+1)//2)]
        matching_smaller = match_lists(list1_smaller, list2_smaller)

        # Start with that matching.
        matching = [None] * len(list1)
        for i in range(len(matching_smaller)):
            matching[2*i] = 2*matching_smaller[i]

        # Fill in the holes between
        for i in range(len(matching) - 1):
            if matching[i] is None:
                best_j = matching[i-1] + 1
                best_cost = (list1[i] - list2[best_j])**2
                for j in range(best_j+1, matching[i+1]):
                    cost = (list1[i] - list2[j])**2
                    if cost < best_cost:
                        best_cost = cost
                        best_j = j
                matching[i] = best_j

        # And fill in the last one if needed
        if matching[-1] is None:
            if matching[-2] + 1 == len(list2):
                # This will be an invalid matching, but improve will fix that.
                matching[-1] = matching[-2]
            else:
                best_j = matching[-2] + 1
                best_cost = (list1[-2] - list2[best_j])**2
                for j in range(best_j+1, len(list2)):
                    cost = (list1[-1] - list2[j])**2
                    if cost < best_cost:
                        best_cost = cost
                        best_j = j
                matching[-1] = best_j

        # And now improve.
        return improve_matching(list1, list2, matching)

def best_matching (list1, list2):
    matching = match_lists(list1, list2)
    cost = 0.0
    result = []
    for i in range(len(matching)):
        pair = (list1[i], list2[matching[i]])
        result.append(pair)
        cost = cost + (pair[0] - pair[1])**2
    return (cost, result)

更新

上面有一個錯誤。 它可以用match_lists([1, 3], [0, 0, 0, 0, 0, 1, 3])來演示。 然而,下面的解決方案也是O(n) ,我相信沒有錯誤。 不同之處在於,我不是尋找固定寬度的帶,而是尋找由先前匹配動態確定的寬度。 由於在任何給定的點上看起來匹配的條目不超過 5 個,因此該數組的O(n)和幾何遞減的遞歸調用再次結束。 但是相同值的長段不會導致問題。

def match_lists (list1, list2):
    prev_matching = []

    if 0 == len(list1):
        # Trivial match
        return prev_matching
    elif 1 < len(list1):
        # Solve a smaller problem first.
        list1_smaller = [list1[2*i] for i in range((len(list1)+1)//2)]
        list2_smaller = [list2[2*i] for i in range((len(list2)+1)//2)]
        prev_matching = match_lists(list1_smaller, list2_smaller)

    best_j_last = -1
    last = {-1: (0.0, None)}
    for i in range(len(list1)):
        lowest_j = 0
        highest_j = len(list2) - 1
        if 3 < i:
            lowest_j = 2 * prev_matching[i//2 - 2]
        if i + 4 < len(list1):
            highest_j = 2 * prev_matching[i//2 + 2]

        if best_j_last == highest_j:
            # Have to push it back.
            best_j_last = best_j_last - 1

        best_cost = last[best_j_last][0] + (list1[i] - list2[highest_j])**2
        best_j = highest_j
        this = {best_j: (best_cost, [best_j, last[best_j_last][1]])}

        # Now try the others.
        for j in range(lowest_j, highest_j):
            prev_j = best_j_last
            if j <= prev_j:
                prev_j = j - 1

            if prev_j not in last:
                continue
            else:
                cost = last[prev_j][0] + (list1[i] - list2[j])**2
                this[j] = (cost, [j, last[prev_j][1]])
                if cost < best_cost:
                    best_cost = cost
                    best_j = j

        last = this
        best_j_last = best_j

    (final_cost, linked_list) = last[best_j_last]
    matching_rev = []
    while linked_list is not None:
        matching_rev.append( linked_list[0])
        linked_list = linked_list[1]
    matching_new = [x for x in reversed(matching_rev)]

    return matching_new

def best_matching (list1, list2):
    matching = match_lists(list1, list2)
    cost = 0.0
    result = []
    for i in range(len(matching)):
        pair = (list1[i], list2[matching[i]])
        result.append(pair)
        cost = cost + (pair[0] - pair[1])**2
    return (cost, result)

筆記

我被要求解釋為什么這有效。

這是我的啟發式理解。 在算法中,我們解決了半問題。 然后我們必須解決完整的問題。

問題是,完整問題的最佳解決方案與半問題的最佳解決方案相距多遠? 我們通過讓list2中每個不在半問題中的元素盡可能大,並將list1中每個不在半問題中的元素盡可能小來將其推到右邊。 但是,如果我們將一半問題中的那些向右推,並將重復元素放在它們然后模邊界效應的位置,我們就得到了一半問題的 2 個最佳解決方案,並且沒有比下一個元素正確的位置移動更多是在一半的問題。 類似的推理適用於試圖強制留下解決方案。

現在讓我們討論這些邊界效應。 那些邊界效應在最后是 1 個元素。 所以當我們試圖把一個元素從最后推下去時,我們不能總是這樣。 通過查看 2 個元素而不是 1 個元素,我們也添加了足夠的擺動空間來解決這個問題。

因此,必須有一個最佳解決方案,它非常接近以明顯方式加倍的半問題。 可能還有其他的,但至少有一個。 而DP步驟會找到它。

我需要做一些工作才能將這種直覺轉化為正式的證明,但我相信它可以做到。

這是一個遞歸解決方案。 挑中間元件a ; 映射,為每個可能的元素b (假足上的每個端部,以適應的左和右部分a )。 對於每個這樣的映射,計算單元素成本; 然后在ab每個左右片段上重復。

這是代碼; 我將把記憶作為學生的練習。

test_case = [
    [ [1, 2], [0, 1, 10] ],
    [ [1.1, 2.3, 5.6, 5.7, 10.1], [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8] ],
]

import math
indent = ""


def best_match(a, b):
    """
    Find the best match for elements in a mapping to b, preserving order
    """

    global indent
    indent += "  "
    # print(indent, "ENTER", a, b)

    best_cost = math.inf
    best_map = []

    if len(a) == 0:
        best_cost = 0
        best_map = []

    else:

        # Match the middle element of `a` to each eligible element of `b`
        a_midpt = len(a) // 2
        a_elem = a[a_midpt]
        l_margin = a_midpt
        r_margin = a_midpt + len(b) - len(a) 

        for b_pos in range(l_margin, r_margin+1):
            # For each match ...
            b_elem = b[b_pos]
            # print(indent, "TRACE", a_elem, b_elem)

            # ... compute the element cost ...
            mid_cost = (a_elem - b_elem)**2

            # ... and recur for similar alignments on left & right list fragments
            l_cost, l_map = best_match(a[:l_margin], b[:b_pos])
            r_cost, r_map = best_match(a[l_margin+1:], b[b_pos+1:])

            # Check total cost against best found; keep the best
            cand_cost = l_cost + mid_cost + r_cost
            # print(indent, " COST", mid_cost, l_cost, r_cost)
            if cand_cost < best_cost:
                best_cost = cand_cost
                best_map = l_map[:] + [(a_elem, b_elem)]
                best_map.extend(r_map[:])

    # print(indent, "LEAVE", best_cost, best_map)
    return best_cost, best_map


for a, b in test_case:
    print('\n', a, b)
    print(best_match(a, b))

輸出:

 a = [1, 2] 
 b = [0, 1, 10]
2 [(1, 0), (2, 1)]

 a = [1.1, 2.3, 5.6, 5.7, 10.1] 
 b = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
16.709999999999997 [(1.1, 1.9), (2.3, 2.4), (5.6, 2.7), (5.7, 8.4), (10.1, 10.7)]

對於咯咯笑聲,這是一個比其他任何一個工作都快得多的解決方案。 這個想法很簡單。 首先,我們從左到右進行貪婪匹配。 然后是從右到左的貪婪匹配。 這給了我們每個元素可以去哪里的界限。 然后我們可以從左到右做一個 DP 解決方案,只查看可能的值。

如果貪心方法同意,這將需要線性時間。 如果貪婪方法相距很遠,則這可能需要二次時間。 但希望貪婪的方法產生相當接近的結果,從而產生接近線性的性能。

def match_lists(list1, list2):
    # First we try a greedy matching from left to right.
    # This gives us, for each element, the last place it could
    # be forced to match. (It could match later, for instance
    # in a run of equal values in list2.)
    match_last = []
    j = 0
    for i in range(len(list1)):
        while True:
            if len(list2) - j <= len(list1) - i:
                # We ran out of room.
                break
            elif abs(list2[j+1] - list1[i]) <= abs(list2[j] - list1[i]):
                # Take the better value
                j = j + 1
            else:
                break
        match_last.append(j)
        j = j + 1

    # Next we try a greedy matching from right to left.
    # This gives us, for each element, the first place it could be
    # forced to match.
    # We build it in reverse order, then reverse.
    match_first_rev = []
    j = len(list2) - 1
    for i in range(len(list1) - 1, -1, -1):
        while True:
            if j <= i:
                # We ran out of room
                break
            elif abs(list2[j-1] - list1[i]) <= abs(list2[j] - list1[i]):
                # Take the better value
                j = j - 1
            else:
                break
        match_first_rev.append(j)
        j = j - 1
    match_first = [x for x in reversed(match_first_rev)]

    # And now we do DP forward, building up our answer as a linked list.
    best_j_last = -1
    last = {-1: (0.0, None)}
    for i in range(len(list1)):
        # We initialize with the last position we could choose.
        best_j = match_last[i]
        best_cost = last[best_j_last][0] + (list1[i] - list2[best_j])**2
        this = {best_j: (best_cost, [best_j, last[best_j_last][1]])}

        # Now try the rest of the range of possibilities
        for j in range(match_first[i], match_last[i]):
            j_prev = best_j_last
            if j <= j_prev:
                j_prev = j - 1 # Push back to the last place we could match
            cost = last[j_prev][0] + (list1[i] - list2[j])**2
            this[j] = (cost, [j, last[j_prev][1]])
            if cost < best_cost:
                best_cost = cost
                best_j = j
        last = this
        best_j_last = best_j

    (final_cost, linked_list) = last[best_j_last]
    matching_rev = []
    while linked_list is not None:
        matching_rev.append(
                (list1[len(matching_rev)], list2[linked_list[0]]))
        linked_list = linked_list[1]
    matching = [x for x in reversed(matching_rev)]
    return (final_cost, matching)

print(match_lists([1.1, 2.3, 5.6, 5.7, 10.1], [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]))

Python 對遞歸不是很友好,因此嘗試將其應用於包含數千個元素的列表可能不太公平。 這是一種自下而上的方法,它利用a來自A任何a的最佳解決方案,因為我們從B增加了其潛在合作伙伴的指數,而不是減少。 (適用於排序和非排序輸入。)

def f(A, B):
  m = [[(float('inf'), -1) for b in B] for a in A]

  for i in xrange(len(A)):
    for j in xrange(i, len(B) - len(A) + i + 1):
      d = (A[i] - B[j]) ** 2

      if i == 0:
        if j == i:
          m[i][j] = (d, j)
        elif d < m[i][j-1][0]:
          m[i][j] = (d, j)
        else:
          m[i][j] = m[i][j-1]
      # i > 0
      else:
        candidate = d + m[i-1][j-1][0]
        if j == i:
          m[i][j] = (candidate, j)
        else:
          if candidate < m[i][j-1][0]:
            m[i][j] = (candidate, j)
          else:
            m[i][j] = m[i][j-1]

  result = m[len(A)-1][len(B)-1][0]
  # Backtrack
  lst = [None for a in A]
  j = len(B) - 1
  for i in xrange(len(A)-1, -1, -1):
    j = m[i][j][1]
    lst[i] = j
    j = j - 1
  return (result, [(A[i], B[j]) for i, j in enumerate(lst)])

A = [1, 2]
B = [0, 1, 10000]
print f(A, B)
print ""
A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
print f(A, B)

輸出:

(2, [(1, 0), (2, 1)])

(16.709999999999997, [(1.1, 1.9), (2.3, 2.4), (5.6, 2.7), (5.7, 8.4), (10.1, 10.7)])

更新

這是一個O(|B|)空間實現。 我不確定這是否仍然提供了一種回溯獲取映射的方法,但我正在努力。

def f(A, B):
  m = [(float('inf'), -1) for b in B]
  m1 = [(float('inf'), -1) for b in B] # m[i-1]

  for i in xrange(len(A)):
    for j in xrange(i, len(B) - len(A) + i + 1):
      d = (A[i] - B[j]) ** 2

      if i == 0:
        if j == i:
          m[j] = (d, j)
        elif d < m[j-1][0]:
          m[j] = (d, j)
        else:
          m[j] = m[j-1]
      # i > 0
      else:
        candidate = d + m1[j-1][0]
        if j == i:
          m[j] = (candidate, j)
        else:
          if candidate < m[j-1][0]:
            m[j] = (candidate, j)
          else:
            m[j] = m[j-1]

    m1 = m
    m = m[:len(B) - len(A) + i + 1] + [(float('inf'), -1)] * (len(A) - i - 1)

  result = m1[len(B)-1][0]
  # Backtrack
  # This doesn't work as is
  # to get the mapping
  lst = [None for a in A]
  j = len(B) - 1
  for i in xrange(len(A)-1, -1, -1):
    j = m1[j][1]
    lst[i] = j
    j = j - 1
  return (result, [(A[i], B[j]) for i, j in enumerate(lst)])

A = [1, 2]
B = [0, 1, 10000]
print f(A, B)
print ""
A = [1.1, 2.3, 5.6, 5.7, 10.1]
B = [0, 1.9, 2.4, 2.7, 8.4, 9.1, 10.7, 11.8]
print f(A, B)

import random
import time

A = [random.uniform(0, 10000.5) for i in xrange(10000)]
B = [random.uniform(0, 10000.5) for i in xrange(15000)]

start = time.time()
print f(A, B)[0]
end = time.time()
print(end - start)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM