计算天际线面积时如何减少/优化内存使用量？

Question

我正在尝试计算天际线的面积（基线相同的重叠矩形）

building_count = int(input())
items = {} # dictionary, location on x axis is the key, height is the value
count = 0 # total area
for j in range(building_count):
    line = input().split(' ')
    H = int(line[0]) # height
    L = int(line[1]) # left point (start of the building)
    R = int(line[2]) # right point (end of the building)
    for k in range(R - L):
        if not (L+k in  items): # if it's not there, add it
            items[L+k] = H
        elif H > items[L+k]: # if we have a higher building on that index
            items[L+k] = H
for value in items.values(): # we add each column basically
    count += value
print(count)

样本输入为：

输出为29 。

问题是内存效率，当有很多值时，脚本只抛出MemoryError 。 任何人都有一些优化内存使用的想法？

Answer 1

您正在为范围内的每个单个整数值分配一个单独的键值对。 想象一下R = 1且L = 100000 。 您的items字典将填充1000000个项目。 处理/消除重叠的基本思想是合理的，但是这样做的方式却是大材小用。

就像生活中的许多其他事情一样，这是一个变相的图形问题。 将顶点成像为您要处理的矩形，将（加权的）边缘成像为重叠的像素。 复杂之处在于，您不能仅将顶点的面积相加并减去重叠的面积，因为许多重叠也彼此重叠。 重叠问题可以通过应用一种转换来解决，该转换将两个重叠的矩形转换为非重叠的矩形，从而有效地切断了连接它们的边缘。 下图显示了转换。 请注意，在某些情况下，其中一个顶点也会被删除，从而简化图形，而在另一种情况下，会添加新的顶点：

^{绿色：要切掉的重叠部分。}

通常，如果我们有m矩形并且它们之间有n重叠，则构造图将是O(m ² )运算，因为我们必须检查所有顶点之间是否有重叠。 但是，我们可以完全绕过输入图的构造来获得O(m + n)遍历算法，这将是最优的，因为我们将只分析每个矩形一次，并以没有重叠的效率构造输出图。可能。 O(m + n)假定您输入的矩形根据其左边缘以升序排序。 如果不是这种情况，该算法将为O(mlog(m) + n)来考虑初始排序步骤。 注意，作为图形密度的增加， n将从去~m至~m ² 。 这证实了一个直观的想法，即重叠越少，您期望该过程将以O(m)时间运行的重叠越多，而重叠越多，您将越接近O(m ² )时间运行。

所提出算法的空间复杂度将为O(m) ：输入中的每个矩形将在输出中导致最多两个矩形，并且2m = O(m) 。

足够的复杂性分析以及算法本身。 现在，输入将是由L ， R ， H定义的矩形序列。 我将假定输入按最左边L排序。 输出图将是由相同参数定义的矩形的链表，并按最右边缘的降序排列。 列表的头部将是最右边的矩形。 输出将不会在任何矩形之间重叠，因此，每个~m输出矩形的天际线总面积将仅为H * (R - L)的总和。

选择链接列表的原因是，我们所需的唯一两个操作是从头节点进行迭代，并以尽可能便宜的插入率将列表保持在已排序的顺序。 排序将作为重叠检查的一部分进行，因此我们不需要对列表进行任何类型的二进制搜索或类似的操作。

由于输入列表是通过增加左边缘的顺序来排序的，而输出列表是通过减少右边缘的顺序的，所以我们可以保证仅对每个添加的矩形进行检查，使其实际上与¹重叠。 如上图所示，我们将进行重叠检查和删除，直到到达其左边缘小于或等于新矩形的左边缘的矩形。 确保输出列表中的所有其他矩形不与新矩形重叠。 此检查和斩波操作可确保每个重叠最多访问一次，并且不会不必要地处理不重叠的矩形，从而使算法达到最佳状态。

在展示代码之前，这是运行中算法的示意图。 红色矩形是新矩形。 请注意，它们的左边缘向右延伸。 蓝色矩形是已经添加并与新矩形重叠的矩形。 黑色矩形已经添加，并且与新矩形没有重叠。 编号代表输出列表的顺序。 总是从右边开始。 链表是保持这种进展的理想结构，因为它允许廉价的插入和替换：

这是算法的一种实现，它假定输入坐标作为具有属性l ， r和h的对象的可迭代对象传入。 假定迭代顺序按左边缘排序。 如果不是这种情况，请首先将sorted或list.sort应用于输入：

from collections import namedtuple

# Defined in this order so you can sort a list by left edge without a custom key
Rect = namedtuple('Rect', ['l', 'r', 'h'])

class LinkedList:
    __slots__ = ['value', 'next']

    """
    Implements a singly-linked list with mutable nodes and an iterator.
    """
    def __init__(self, value=None, next=None):
        self.value = value
        self.next = next

    def __iter__(self):
        """
        Iterate over the *nodes* in the list, starting with this one.

        The `value` and `next` attribute of any node may be modified
        during iteration.
        """
        while self:
            yield self
            self = self.next

    def __str__(self):
        """
        Provided for inspection purposes.

        Works well with `namedtuple` values.
        """
        return ' -> '.join(repr(x.value) for x in self)


def process_skyline(skyline):
    """
    Turns an iterable of rectangles sharing a common baseline into a
    `LinkedList` of rectangles containing no overlaps.

    The input is assumed to be sorted in ascending order by left edge.
    Each element of the input must have the attributes `l`, r`, `h`.

    The output will be sorted in descending order by right edge.

    Return `None` if the input is empty.
    """
    def intersect(r1, r2, default=None):
        """
        Return (1) a flag indicating the order of `r1` and `r2`,
        (2) a linked list of between one and three non-overlapping
        rectangles covering the exact same area as `r1` and `r2`,
        and (3) a pointer to the last nodes (4) a pointer to the
        second-to-last node, or `default` if there is only one node.

        The flag is set to True if the left edge of `r2` is strictly less
        than the left edge of `r1`. That would indicate that the left-most
        (last) chunk of the tuple came from `r2` instead of `r1`. For the
        algorithm as a whole, that means that we need to keep checking for
        overlaps.

        The resulting list is always returned sorted descending by the
        right edge. The input rectangles will not be modified. If they are
        not returned as-is, a `Rect` object will be used instead.
        """
        # Swap so left edge of r1 < left edge of r2
        if r1.l > r2.l:
            r1, r2 = r2, r1
            swapped = True
        else:
            swapped = False

        if r2.l >= r1.r:
            # case 0: no overlap at all
            last = LinkedList(r1)
            s2l = result = LinkedList(r2, last)
        elif r1.r < r2.r:
            # case 1: simple overlap
            if r1.h > r2.h:
                # Chop r2
                r2 = Rect(r1.r, r2.r, r2.h)
            else:
                r1 = Rect(r1.l, r2.l, r1.h)
            last = LinkedList(r1)
            s2l = result = LinkedList(r2, last)
        elif r1.h < r2.h:
            # case 2: split into 3
            r1a = Rect(r1.l, r2.l, r1.h)
            r1b = Rect(r2.r, r1.r, r1.h)
            last = LinkedList(r1a)
            s2l = LinkedList(r2, last)
            result = LinkedList(r1b, s2l)
        else:
            # case 3: complete containment
            result = LinkedList(r1)
            last = result
            s2l = default

        return swapped, result, last, s2l

    root = LinkedList()

    skyline = iter(skyline)
    try:
        # Add the first node as-is
        root.next = LinkedList(next(skyline))
    except StopIteration:
        # Empty input iterator
        return None

    for new_rect in skyline:
        prev = root
        for rect in root.next:
            need_to_continue, replacement, last, second2last = \
                    intersect(rect.value, new_rect, prev)
            # Replace the rectangle with the de-overlapped regions
            prev.next = replacement
            if not need_to_continue:
                # Retain the remainder of the list
                last.next = rect.next
                break
            # Force the iterator to move on to the last node
            new_rect = last.value
            prev = second2last

    return root.next

现在计算总面积很简单：

skyline = [
    Rect(-3, 0, 3), Rect(-1, 1, 2), Rect(2, 4, 4),
    Rect(3, 7, 2), Rect(6, 8, 3),
]
processed = process_skyline(skyline)
area = sum((x.value.r - x.value.l) * x.value.h for x in processed) if processed else None

注意输入参数的顺序更改（ h移至末尾）。 产生的area为29 。 这与我手工进行计算得到的结果相匹配。 你也可以

>>> print(processed)
Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) -> Rect(l=2, r=4, h=4) ->
Rect(l=0, r=1, h=2) -> Rect(l=-3, r=0, h=3)

可以从下面显示的输入/输出图表中看到这一点：

作为额外的验证，我在列表的开头添加了一个新建筑物Rect(-4, 9, 1) 。 它与所有其他重叠，并向area添加三个单位，或最终结果为32 。 processed结果为：

Rect(l=8, r=9, h=1) -> Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) ->
Rect(l=2, r=4, h=4) -> Rect(l=1, r=2, h=1) -> Rect(l=0, r=1, h=2) ->
Rect(l=-3, r=0, h=3) -> Rect(l=-4, r=-3, h=1)

注意：

尽管我确信这个问题已经解决了很多遍，但我在这里提出的解决方案完全是我自己的工作，无需咨询其他参考文献即可完成。 使用隐式图形表示法和结果分析的想法是受史蒂文·斯基埃纳（Steven Skiena）的算法设计手册第二版的最新阅读启发的。 这是我见过的最好的计算机科学书籍之一。

¹从技术上讲，如果一个新矩形不与任何其他矩形重叠，则将针对一个不重叠的矩形进行检查。 如果总是需要进行额外的检查，则该算法将需要进行额外的m - 1比较。 幸运的是，即使我们总是必须检查一个额外的矩形（我们不需要）， m + m + n - 1 = O(m + n) ）。

Answer 2

出现MemoryError的原因是正在创建的字典很大。 在最坏的情况下，字典可能有10 ^ 10个键，这最终将占用您的所有内存。 如果确实需要，则shelve是使用如此大的命令的可能解决方案。

假设有一栋建筑物有10 0 100而另一栋建筑物有20 50 150 ，那么该列表可能具有类似[(-10^9, 0), (0, 10), (50, 20), (150, 0), (10^9, 0)] 。 当您遇到更多建筑物时，可以在此列表中添加更多条目。 这将是O(n^2) 。

这可能会进一步帮助您。

计算天际线面积时如何减少/优化内存使用量？

问题描述

2 个解决方案

解决方案1
3 2017-11-16 08:22:25

解决方案2
2 已采纳 2017-11-14 20:11:06

计算天际线面积时如何减少/优化内存使用量？

问题描述

2 个解决方案

解决方案1 3 2017-11-16 08:22:25

解决方案2 2 已采纳 2017-11-14 20:11:06

解决方案1
3 2017-11-16 08:22:25

解决方案2
2 已采纳 2017-11-14 20:11:06