简体   繁体   中英

Share memory between identical objects in python

I create a long list of python objects, with many identical entries. In order to save some memory, I want to ensure that identical elements share their memory. For example:

In [1]: alist = [(3, 4), (3, 4)]

In [2]: alist[0] == alist[1]
Out[2]: True

In [3]: alist[0] is alist[1]
Out[3]: False

The two list elements are equal, but use separate memory. In this simple example the problem can be fixed by

In [4]: alist[1] = alist[0]

In [5]: alist[1] is alist[0]
Out[5]: True

How can this be done in a more general way? The list is not sorted, so identical elements are usually not next to each other. One solution I came up with is this:

g_dupdict = dict()
def dedup(x):
  try:
    return g_dupdict[x]
  except KeyError:
    g_dupdict[x] = x
    return x

for k in range(len(alist)):
  alist[k] = dedup(alist[k])

This works, but introduces new problems. It seems silly to use a dict, a set should do, but I don't know how to make this work. And then the dict holds an additional reference to each object, so the memory doesn't get freed when the elements are deleted from the list. To fix this, I delete the dict occasionally, but as a result it must be re-created whenever new elements get added to the list. Is there is better solution to this problem? Thanks.

I imagine you are manipulating huge data structures in order to be concern about that. One way to keep g_dedupdict scoped to a given data structure is to encapsulate it inside your own list implementation:

class CachedList(list):
    def __init__(self, iterable=(), cache=None):
        self.__cache = cache or {}
        super().__init__([self.__cached(item) for item in iterable])

    def append(self, e):
        super().append(self.__cached(e))

    def clear(self):
        self.__cache.clear()
        super().clear()

    def copy(self):
        return CachedList(self, self.__cache)

    def extend(self, iterable):
        super().extend([self.__cached(item) for item in iterable])

    def insert(self, index, e):
        super().insert(index, self.__cached(e))

    def remove(self, e):
        super().remove(self.__cached(e))
        del self.__cache[hash(e)]

    def __cached(self, e):
        if hash(e) not in self.__cache:
            self.__cache[hash(e)] = e
        return self.__cache[hash(e)]

    def __add__(self, iterable):
        cached_iterable = [self.__cached(item) for item in iterable]
        return CachedList(super().__add__(cached_iterable), self.__cache)

    def __delitem__(self, e):
        super().__delitem__(self.__cached(e))
        del self.__cache[hash(e)]

    def __iadd__(self, e):
        self.append(self.__cached(e))
        return self

    def __setitem__(self, index, e):
        super().__setitem__(index, self.__cached(e))


if __name__ == '__main__':
    e1 = (3, 4)
    e2 = (3, 4)
    assert e1 == e2
    assert e1 is not e2
    assert hash(e1) == hash(e2)

    alist = CachedList([(3, 4), (3, 4)])
    assert alist[0] == alist[1]
    assert alist[0] is alist[1]

    alist[0] = (3, 4)
    assert alist[0] == alist[1]
    assert alist[0] is alist[1]

    alist += (3, 4)
    assert alist[0] == alist[-1]
    assert alist[0] is alist[-1]

    newlist = alist + [(3, 4)]
    assert newlist[0] == newlist[-1]
    assert newlist[0] is newlist[-1]

The implementation may not be complete, but I think can get the idea from the code snippet above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM