計算天際線面積時如何減少/優化內存使用量？

Question

我正在嘗試計算天際線的面積（基線相同的重疊矩形）

building_count = int(input())
items = {} # dictionary, location on x axis is the key, height is the value
count = 0 # total area
for j in range(building_count):
    line = input().split(' ')
    H = int(line[0]) # height
    L = int(line[1]) # left point (start of the building)
    R = int(line[2]) # right point (end of the building)
    for k in range(R - L):
        if not (L+k in  items): # if it's not there, add it
            items[L+k] = H
        elif H > items[L+k]: # if we have a higher building on that index
            items[L+k] = H
for value in items.values(): # we add each column basically
    count += value
print(count)

樣本輸入為：

輸出為29 。

問題是內存效率，當有很多值時，腳本只拋出MemoryError 。 任何人都有一些優化內存使用的想法？

Answer 1

您正在為范圍內的每個單個整數值分配一個單獨的鍵值對。 想象一下R = 1且L = 100000 。 您的items字典將填充1000000個項目。 處理/消除重疊的基本思想是合理的，但是這樣做的方式卻是大材小用。

就像生活中的許多其他事情一樣，這是一個變相的圖形問題。 將頂點成像為您要處理的矩形，將（加權的）邊緣成像為重疊的像素。 復雜之處在於，您不能僅將頂點的面積相加並減去重疊的面積，因為許多重疊也彼此重疊。 重疊問題可以通過應用一種轉換來解決，該轉換將兩個重疊的矩形轉換為非重疊的矩形，從而有效地切斷了連接它們的邊緣。 下圖顯示了轉換。 請注意，在某些情況下，其中一個頂點也會被刪除，從而簡化圖形，而在另一種情況下，會添加新的頂點：

^{綠色：要切掉的重疊部分。}

通常，如果我們有m矩形並且它們之間有n重疊，則構造圖將是O(m ² )運算，因為我們必須檢查所有頂點之間是否有重疊。 但是，我們可以完全繞過輸入圖的構造來獲得O(m + n)遍歷算法，這將是最優的，因為我們將只分析每個矩形一次，並以沒有重疊的效率構造輸出圖。可能。 O(m + n)假定您輸入的矩形根據其左邊緣以升序排序。 如果不是這種情況，該算法將為O(mlog(m) + n)來考慮初始排序步驟。 注意，作為圖形密度的增加， n將從去~m至~m ² 。 這證實了一個直觀的想法，即重疊越少，您期望該過程將以O(m)時間運行的重疊越多，而重疊越多，您將越接近O(m ² )時間運行。

所提出算法的空間復雜度將為O(m) ：輸入中的每個矩形將在輸出中導致最多兩個矩形，並且2m = O(m) 。

足夠的復雜性分析以及算法本身。 現在，輸入將是由L ， R ， H定義的矩形序列。 我將假定輸入按最左邊L排序。 輸出圖將是由相同參數定義的矩形的鏈表，並按最右邊緣的降序排列。 列表的頭部將是最右邊的矩形。 輸出將不會在任何矩形之間重疊，因此，每個~m輸出矩形的天際線總面積將僅為H * (R - L)的總和。

選擇鏈接列表的原因是，我們所需的唯一兩個操作是從頭節點進行迭代，並以盡可能便宜的插入率將列表保持在已排序的順序。 排序將作為重疊檢查的一部分進行，因此我們不需要對列表進行任何類型的二進制搜索或類似的操作。

由於輸入列表是通過增加左邊緣的順序來排序的，而輸出列表是通過減少右邊緣的順序的，所以我們可以保證僅對每個添加的矩形進行檢查，使其實際上與¹重疊。 如上圖所示，我們將進行重疊檢查和刪除，直到到達其左邊緣小於或等於新矩形的左邊緣的矩形。 確保輸出列表中的所有其他矩形不與新矩形重疊。 此檢查和斬波操作可確保每個重疊最多訪問一次，並且不會不必要地處理不重疊的矩形，從而使算法達到最佳狀態。

在展示代碼之前，這是運行中算法的示意圖。 紅色矩形是新矩形。 請注意，它們的左邊緣向右延伸。 藍色矩形是已經添加並與新矩形重疊的矩形。 黑色矩形已經添加，並且與新矩形沒有重疊。 編號代表輸出列表的順序。 總是從右邊開始。 鏈表是保持這種進展的理想結構，因為它允許廉價的插入和替換：

這是算法的一種實現，它假定輸入坐標作為具有屬性l ， r和h的對象的可迭代對象傳入。 假定迭代順序按左邊緣排序。 如果不是這種情況，請首先將sorted或list.sort應用於輸入：

from collections import namedtuple

# Defined in this order so you can sort a list by left edge without a custom key
Rect = namedtuple('Rect', ['l', 'r', 'h'])

class LinkedList:
    __slots__ = ['value', 'next']

    """
    Implements a singly-linked list with mutable nodes and an iterator.
    """
    def __init__(self, value=None, next=None):
        self.value = value
        self.next = next

    def __iter__(self):
        """
        Iterate over the *nodes* in the list, starting with this one.

        The `value` and `next` attribute of any node may be modified
        during iteration.
        """
        while self:
            yield self
            self = self.next

    def __str__(self):
        """
        Provided for inspection purposes.

        Works well with `namedtuple` values.
        """
        return ' -> '.join(repr(x.value) for x in self)


def process_skyline(skyline):
    """
    Turns an iterable of rectangles sharing a common baseline into a
    `LinkedList` of rectangles containing no overlaps.

    The input is assumed to be sorted in ascending order by left edge.
    Each element of the input must have the attributes `l`, r`, `h`.

    The output will be sorted in descending order by right edge.

    Return `None` if the input is empty.
    """
    def intersect(r1, r2, default=None):
        """
        Return (1) a flag indicating the order of `r1` and `r2`,
        (2) a linked list of between one and three non-overlapping
        rectangles covering the exact same area as `r1` and `r2`,
        and (3) a pointer to the last nodes (4) a pointer to the
        second-to-last node, or `default` if there is only one node.

        The flag is set to True if the left edge of `r2` is strictly less
        than the left edge of `r1`. That would indicate that the left-most
        (last) chunk of the tuple came from `r2` instead of `r1`. For the
        algorithm as a whole, that means that we need to keep checking for
        overlaps.

        The resulting list is always returned sorted descending by the
        right edge. The input rectangles will not be modified. If they are
        not returned as-is, a `Rect` object will be used instead.
        """
        # Swap so left edge of r1 < left edge of r2
        if r1.l > r2.l:
            r1, r2 = r2, r1
            swapped = True
        else:
            swapped = False

        if r2.l >= r1.r:
            # case 0: no overlap at all
            last = LinkedList(r1)
            s2l = result = LinkedList(r2, last)
        elif r1.r < r2.r:
            # case 1: simple overlap
            if r1.h > r2.h:
                # Chop r2
                r2 = Rect(r1.r, r2.r, r2.h)
            else:
                r1 = Rect(r1.l, r2.l, r1.h)
            last = LinkedList(r1)
            s2l = result = LinkedList(r2, last)
        elif r1.h < r2.h:
            # case 2: split into 3
            r1a = Rect(r1.l, r2.l, r1.h)
            r1b = Rect(r2.r, r1.r, r1.h)
            last = LinkedList(r1a)
            s2l = LinkedList(r2, last)
            result = LinkedList(r1b, s2l)
        else:
            # case 3: complete containment
            result = LinkedList(r1)
            last = result
            s2l = default

        return swapped, result, last, s2l

    root = LinkedList()

    skyline = iter(skyline)
    try:
        # Add the first node as-is
        root.next = LinkedList(next(skyline))
    except StopIteration:
        # Empty input iterator
        return None

    for new_rect in skyline:
        prev = root
        for rect in root.next:
            need_to_continue, replacement, last, second2last = \
                    intersect(rect.value, new_rect, prev)
            # Replace the rectangle with the de-overlapped regions
            prev.next = replacement
            if not need_to_continue:
                # Retain the remainder of the list
                last.next = rect.next
                break
            # Force the iterator to move on to the last node
            new_rect = last.value
            prev = second2last

    return root.next

現在計算總面積很簡單：

skyline = [
    Rect(-3, 0, 3), Rect(-1, 1, 2), Rect(2, 4, 4),
    Rect(3, 7, 2), Rect(6, 8, 3),
]
processed = process_skyline(skyline)
area = sum((x.value.r - x.value.l) * x.value.h for x in processed) if processed else None

注意輸入參數的順序更改（ h移至末尾）。 產生的area為29 。 這與我手工進行計算得到的結果相匹配。 你也可以

>>> print(processed)
Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) -> Rect(l=2, r=4, h=4) ->
Rect(l=0, r=1, h=2) -> Rect(l=-3, r=0, h=3)

可以從下面顯示的輸入/輸出圖表中看到這一點：

作為額外的驗證，我在列表的開頭添加了一個新建築物Rect(-4, 9, 1) 。 它與所有其他重疊，並向area添加三個單位，或最終結果為32 。 processed結果為：

Rect(l=8, r=9, h=1) -> Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) ->
Rect(l=2, r=4, h=4) -> Rect(l=1, r=2, h=1) -> Rect(l=0, r=1, h=2) ->
Rect(l=-3, r=0, h=3) -> Rect(l=-4, r=-3, h=1)

注意：

盡管我確信這個問題已經解決了很多遍，但我在這里提出的解決方案完全是我自己的工作，無需咨詢其他參考文獻即可完成。 使用隱式圖形表示法和結果分析的想法是受史蒂文·斯基埃納（Steven Skiena）的算法設計手冊第二版的最新閱讀啟發的。 這是我見過的最好的計算機科學書籍之一。

¹從技術上講，如果一個新矩形不與任何其他矩形重疊，則將針對一個不重疊的矩形進行檢查。 如果總是需要進行額外的檢查，則該算法將需要進行額外的m - 1比較。 幸運的是，即使我們總是必須檢查一個額外的矩形（我們不需要）， m + m + n - 1 = O(m + n) ）。

Answer 2

出現MemoryError的原因是正在創建的字典很大。 在最壞的情況下，字典可能有10 ^ 10個鍵，這最終將占用您的所有內存。 如果確實需要，則shelve是使用如此大的命令的可能解決方案。

假設有一棟建築物有10 0 100而另一棟建築物有20 50 150 ，那么該列表可能具有類似[(-10^9, 0), (0, 10), (50, 20), (150, 0), (10^9, 0)] 。 當您遇到更多建築物時，可以在此列表中添加更多條目。 這將是O(n^2) 。

這可能會進一步幫助您。

計算天際線面積時如何減少/優化內存使用量？

問題描述

2 個解決方案

解決方案1
3 2017-11-16 08:22:25

解決方案2
2 已采納 2017-11-14 20:11:06

計算天際線面積時如何減少/優化內存使用量？

問題描述

2 個解決方案

解決方案1 3 2017-11-16 08:22:25

解決方案2 2 已采納 2017-11-14 20:11:06

解決方案1
3 2017-11-16 08:22:25

解決方案2
2 已采納 2017-11-14 20:11:06