计算天际线面积时如何减少/优化内存使用量？

Question

I'm trying to calculate the area of skyline (overlapping rectangles with same baseline) 我正在尝试计算天际线的面积（基线相同的重叠矩形）

building_count = int(input())
items = {} # dictionary, location on x axis is the key, height is the value
count = 0 # total area
for j in range(building_count):
    line = input().split(' ')
    H = int(line[0]) # height
    L = int(line[1]) # left point (start of the building)
    R = int(line[2]) # right point (end of the building)
    for k in range(R - L):
        if not (L+k in  items): # if it's not there, add it
            items[L+k] = H
        elif H > items[L+k]: # if we have a higher building on that index
            items[L+k] = H
for value in items.values(): # we add each column basically
    count += value
print(count)

sample input would be: 样本输入为：

and output is 29 . 输出为29 。

The issue is memory efficiency, when there are lots of values, the script simply throws MemoryError . 问题是内存效率，当有很多值时，脚本只抛出MemoryError 。 Anyone have some ideas for optimizing memory usage? 任何人都有一些优化内存使用的想法？

Answer 1

You are allocating a separate key-value pair for every single integer value in your range. 您正在为范围内的每个单个整数值分配一个单独的键值对。 Imagine the case where R = 1 and L = 100000 . 想象一下R = 1且L = 100000 。 Your items dictionary will be filled with 1000000 items. 您的items字典将填充1000000个项目。 Your basic idea of processing/removing overlaps is is sound, but the way you do it is massive overkill. 处理/消除重叠的基本思想是合理的，但是这样做的方式却是大材小用。

Like so much else in life, this is a graph problem in disguise. 就像生活中的许多其他事情一样，这是一个变相的图形问题。 Imaging the vertices being the rectangles you are trying to process and the (weighted) edges being the overlaps. 将顶点成像为您要处理的矩形，将（加权的）边缘成像为重叠的像素。 The complication is that you can not just add up the areas of the vertices and subtract the areas of the overlaps, because many of the overlaps overlap each other as well. 复杂之处在于，您不能仅将顶点的面积相加并减去重叠的面积，因为许多重叠也彼此重叠。 The overlap issue can be resolved by applying a transformation that converts two overlapping rectangles into non-overlapping rectangles, effectively cutting the edge that connects them. 重叠问题可以通过应用一种转换来解决，该转换将两个重叠的矩形转换为非重叠的矩形，从而有效地切断了连接它们的边缘。 The transformation is shown in the image below. 下图显示了转换。 Notice that in some cases one of the vertices will be removed as well, simplifying the graph, while in another case a new vertex is added: 请注意，在某些情况下，其中一个顶点也会被删除，从而简化图形，而在另一种情况下，会添加新的顶点：

^{Green: overlap to be chopped out.} ^{绿色：要切掉的重叠部分。}

Normally, if we have m rectangles and n overlaps between them, constructing the graph would be an O(m ² ) operation because we would have to check all vertices for overlaps against each other. 通常，如果我们有m矩形并且它们之间有n重叠，则构造图将是O(m ² )运算，因为我们必须检查所有顶点之间是否有重叠。 However, we can bypass a construction of the input graph entirely to get a O(m + n) traversal algorithm, which is going to be optimal since we will only analyze each rectangle once, and construct the output graph with no overlaps as efficiently as possible. 但是，我们可以完全绕过输入图的构造来获得O(m + n)遍历算法，这将是最优的，因为我们将只分析每个矩形一次，并以没有重叠的效率构造输出图。可能。 O(m + n) assumes that your input rectangles are sorted according to their left edges in ascending order. O(m + n)假定您输入的矩形根据其左边缘以升序排序。 If that is not the case, the algorithm will be O(mlog(m) + n) to account for the initial sorting step. 如果不是这种情况，该算法将为O(mlog(m) + n)来考虑初始排序步骤。 Note that as the graph density increases, n will go from ~m to ~m ² . 注意，作为图形密度的增加， n将从去~m至~m ² 。 This confirms the intuitive idea that the fewer overlaps there are, them more you would expect the process will run in O(m) time, while the more overlaps there are, the closer you will run to O(m ² ) time. 这证实了一个直观的想法，即重叠越少，您期望该过程将以O(m)时间运行的重叠越多，而重叠越多，您将越接近O(m ² )时间运行。

The space complexity of the proposed algorithm will be O(m) : each rectangle in the input will result in at most two rectangles in the output, and 2m = O(m) . 所提出算法的空间复杂度将为O(m) ：输入中的每个矩形将在输出中导致最多两个矩形，并且2m = O(m) 。

Enough about complexity analysis and on to the algorithm itself. 足够的复杂性分析以及算法本身。 The input will be a sequence of rectangles defined by L , R , H as you have now. 现在，输入将是由L ， R ， H定义的矩形序列。 I will assume that the input is sorted by the leftmost edge L . 我将假定输入按最左边L排序。 The output graph will be a linked list of rectangles defined by the same parameters, sorted in descending order by the rightmost edge. 输出图将是由相同参数定义的矩形的链表，并按最右边缘的降序排列。 The head of the list will be the rightmost rectangle. 列表的头部将是最右边的矩形。 The output will have no overlaps between any rectangles, so the total area of the skyline will just be the sum of H * (R - L) for each of the ~m output rectangles. 输出将不会在任何矩形之间重叠，因此，每个~m输出矩形的天际线总面积将仅为H * (R - L)的总和。

The reason for picking a linked list is that the only two operations we need is iteration from the head node and the cheapest insertion possible to maintain the list in sorted order. 选择链接列表的原因是，我们所需的唯一两个操作是从头节点进行迭代，并以尽可能便宜的插入率将列表保持在已排序的顺序。 The sorting will be done as part of overlap checking, so we do not need to do any kind of binary searches through the list or anything like that. 排序将作为重叠检查的一部分进行，因此我们不需要对列表进行任何类型的二进制搜索或类似的操作。

Since the input list is ordered by increasing left edge and the output list is ordered by decreasing right edge, we can guarantee that each rectangle added will be checked only against the rectangles it actually overlaps ¹ . 由于输入列表是通过增加左边缘的顺序来排序的，而输出列表是通过减少右边缘的顺序的，所以我们可以保证仅对每个添加的矩形进行检查，使其实际上与¹重叠。 We will do overlap checking and removal as shown in the diagram above until we reach a rectangle whose left edge is less than or equal to the left edge of the new rectangle. 如上图所示，我们将进行重叠检查和删除，直到到达其左边缘小于或等于新矩形的左边缘的矩形。 All further rectangles in the output list are guaranteed not to overlap with the new rectangle. 确保输出列表中的所有其他矩形不与新矩形重叠。 This check-and-chop operation guarantees that each overlap is visited at most once, and that no non-overlapping rectangles are processed unnecessarily, making the algorithm optimal. 此检查和斩波操作可确保每个重叠最多访问一次，并且不会不必要地处理不重叠的矩形，从而使算法达到最佳状态。

Before I show code, here is a diagram of the algorithm in action. 在展示代码之前，这是运行中算法的示意图。 Red rectangles are new rectangles; 红色矩形是新矩形。 note that their left edges progress to the right. 请注意，它们的左边缘向右延伸。 Blue rectangles are ones that are already added and have overlap with the new rectangle. 蓝色矩形是已经添加并与新矩形重叠的矩形。 Black rectangles are already added and have no overlap with the new one. 黑色矩形已经添加，并且与新矩形没有重叠。 The numbering represents the order of the output list. 编号代表输出列表的顺序。 It is always done from the right. 总是从右边开始。 A linked list is a perfect structure to maintain this progression since it allows cheap insertions and replacements: 链表是保持这种进展的理想结构，因为它允许廉价的插入和替换：

Here is an implementation of the algorithm which assumes that the input coordinates are passed in as an iterable of objects having the attributes l , r , and h . 这是算法的一种实现，它假定输入坐标作为具有属性l ， r和h的对象的可迭代对象传入。 The iteration order is assumed to be sorted by the left edge. 假定迭代顺序按左边缘排序。 If that is not the case, apply sorted or list.sort to the input first: 如果不是这种情况，请首先将sorted或list.sort应用于输入：

from collections import namedtuple

# Defined in this order so you can sort a list by left edge without a custom key
Rect = namedtuple('Rect', ['l', 'r', 'h'])

class LinkedList:
    __slots__ = ['value', 'next']

    """
    Implements a singly-linked list with mutable nodes and an iterator.
    """
    def __init__(self, value=None, next=None):
        self.value = value
        self.next = next

    def __iter__(self):
        """
        Iterate over the *nodes* in the list, starting with this one.

        The `value` and `next` attribute of any node may be modified
        during iteration.
        """
        while self:
            yield self
            self = self.next

    def __str__(self):
        """
        Provided for inspection purposes.

        Works well with `namedtuple` values.
        """
        return ' -> '.join(repr(x.value) for x in self)


def process_skyline(skyline):
    """
    Turns an iterable of rectangles sharing a common baseline into a
    `LinkedList` of rectangles containing no overlaps.

    The input is assumed to be sorted in ascending order by left edge.
    Each element of the input must have the attributes `l`, r`, `h`.

    The output will be sorted in descending order by right edge.

    Return `None` if the input is empty.
    """
    def intersect(r1, r2, default=None):
        """
        Return (1) a flag indicating the order of `r1` and `r2`,
        (2) a linked list of between one and three non-overlapping
        rectangles covering the exact same area as `r1` and `r2`,
        and (3) a pointer to the last nodes (4) a pointer to the
        second-to-last node, or `default` if there is only one node.

        The flag is set to True if the left edge of `r2` is strictly less
        than the left edge of `r1`. That would indicate that the left-most
        (last) chunk of the tuple came from `r2` instead of `r1`. For the
        algorithm as a whole, that means that we need to keep checking for
        overlaps.

        The resulting list is always returned sorted descending by the
        right edge. The input rectangles will not be modified. If they are
        not returned as-is, a `Rect` object will be used instead.
        """
        # Swap so left edge of r1 < left edge of r2
        if r1.l > r2.l:
            r1, r2 = r2, r1
            swapped = True
        else:
            swapped = False

        if r2.l >= r1.r:
            # case 0: no overlap at all
            last = LinkedList(r1)
            s2l = result = LinkedList(r2, last)
        elif r1.r < r2.r:
            # case 1: simple overlap
            if r1.h > r2.h:
                # Chop r2
                r2 = Rect(r1.r, r2.r, r2.h)
            else:
                r1 = Rect(r1.l, r2.l, r1.h)
            last = LinkedList(r1)
            s2l = result = LinkedList(r2, last)
        elif r1.h < r2.h:
            # case 2: split into 3
            r1a = Rect(r1.l, r2.l, r1.h)
            r1b = Rect(r2.r, r1.r, r1.h)
            last = LinkedList(r1a)
            s2l = LinkedList(r2, last)
            result = LinkedList(r1b, s2l)
        else:
            # case 3: complete containment
            result = LinkedList(r1)
            last = result
            s2l = default

        return swapped, result, last, s2l

    root = LinkedList()

    skyline = iter(skyline)
    try:
        # Add the first node as-is
        root.next = LinkedList(next(skyline))
    except StopIteration:
        # Empty input iterator
        return None

    for new_rect in skyline:
        prev = root
        for rect in root.next:
            need_to_continue, replacement, last, second2last = \
                    intersect(rect.value, new_rect, prev)
            # Replace the rectangle with the de-overlapped regions
            prev.next = replacement
            if not need_to_continue:
                # Retain the remainder of the list
                last.next = rect.next
                break
            # Force the iterator to move on to the last node
            new_rect = last.value
            prev = second2last

    return root.next

Computing the total area is now trivial: 现在计算总面积很简单：

skyline = [
    Rect(-3, 0, 3), Rect(-1, 1, 2), Rect(2, 4, 4),
    Rect(3, 7, 2), Rect(6, 8, 3),
]
processed = process_skyline(skyline)
area = sum((x.value.r - x.value.l) * x.value.h for x in processed) if processed else None

Notice the altered order of the input parameters ( h moved to the end). 注意输入参数的顺序更改（ h移至末尾）。 The resulting area is 29 . 产生的area为29 。 This matches with what I get by doing the computation by hand. 这与我手工进行计算得到的结果相匹配。 You can also do 你也可以

>>> print(processed)
Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) -> Rect(l=2, r=4, h=4) ->
Rect(l=0, r=1, h=2) -> Rect(l=-3, r=0, h=3)

This is to be expected from the diagram of the inputs/output shown below: 可以从下面显示的输入/输出图表中看到这一点：

As an additional verification, I added a new building, Rect(-4, 9, 1) to the start of the list. 作为额外的验证，我在列表的开头添加了一个新建筑物Rect(-4, 9, 1) 。 It overlaps all the others and adds three units to area , or a final result of 32 . 它与所有其他重叠，并向area添加三个单位，或最终结果为32 。 processed comes out as: processed结果为：

Rect(l=8, r=9, h=1) -> Rect(l=6, r=8, h=3) -> Rect(l=4, r=6, h=2) ->
Rect(l=2, r=4, h=4) -> Rect(l=1, r=2, h=1) -> Rect(l=0, r=1, h=2) ->
Rect(l=-3, r=0, h=3) -> Rect(l=-4, r=-3, h=1)

Note: 注意：

While I am sure that this problem has been solved many times over, the solution I present here is entirely my own work, done without consulting any other references. 尽管我确信这个问题已经解决了很多遍，但我在这里提出的解决方案完全是我自己的工作，无需咨询其他参考文献即可完成。 The idea of using an implicit graph representation and the resulting analysis is inspired by a recent reading of Steven Skiena's Algorithm Design Manual, Second Edition. 使用隐式图形表示法和结果分析的想法是受史蒂文·斯基埃纳（Steven Skiena）的算法设计手册第二版的最新阅读启发的。 It is one of the best comp-sci books I have ever come across. 这是我见过的最好的计算机科学书籍之一。

¹ Technically, if a new rectangle does not overlap any other rectangles, it will be checked against one rectangle it does not overlap. ¹从技术上讲，如果一个新矩形不与任何其他矩形重叠，则将针对一个不重叠的矩形进行检查。 If that extra check was always the case, the algorithm would have an additional m - 1 comparisons to do. 如果总是需要进行额外的检查，则该算法将需要进行额外的m - 1比较。 Fortunately, m + m + n - 1 = O(m + n) even if we always had to check one extra rectangle (which we don't). 幸运的是，即使我们总是必须检查一个额外的矩形（我们不需要）， m + m + n - 1 = O(m + n) ）。

Answer 2

The reason for getting MemoryError is huge size of the dictionary being created. 出现MemoryError的原因是正在创建的字典很大。 In the worst case, the dict can have 10^10 keys, which would end up taking all your memory. 在最坏的情况下，字典可能有10 ^ 10个键，这最终将占用您的所有内存。 If there really is a need, shelve is a possible solution to make use of such large dict. 如果确实需要，则shelve是使用如此大的命令的可能解决方案。

Let's say there is a building with 10 0 100 and another with 20 50 150 , then that list might have info like [(-10^9, 0), (0, 10), (50, 20), (150, 0), (10^9, 0)] . 假设有一栋建筑物有10 0 100而另一栋建筑物有20 50 150 ，那么该列表可能具有类似[(-10^9, 0), (0, 10), (50, 20), (150, 0), (10^9, 0)] 。 As you come across more buildings, you can add more entries in this list. 当您遇到更多建筑物时，可以在此列表中添加更多条目。 This will be O(n^2) . 这将是O(n^2) 。

This might help you further. 这可能会进一步帮助您。

计算天际线面积时如何减少/优化内存使用量？

问题描述

2 个解决方案

解决方案1
3 2017-11-16 08:22:25

解决方案2
2 已采纳 2017-11-14 20:11:06

计算天际线面积时如何减少/优化内存使用量？

问题描述

2 个解决方案

解决方案1 3 2017-11-16 08:22:25

解决方案2 2 已采纳 2017-11-14 20:11:06

解决方案1
3 2017-11-16 08:22:25

解决方案2
2 已采纳 2017-11-14 20:11:06