简体   繁体   English

python:class vs tuple巨大的内存开销(?)

[英]python: class vs tuple huge memory overhead (?)

I'm storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, eg 我在元组/列表中存储了大量复杂数据,但更喜欢使用小包装类来使数据结构更容易理解,例如

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

p = Person('foo', 'bar')
print(p.last)
...

would be preferable over 会优先于

p = ['foo', 'bar']
print(p[1])
...

however there seems to be a horrible memory overhead: 然而 ,似乎有一个可怕的内存开销:

l = [Person('foo', 'bar') for i in range(10000000)]
# ipython now taks 1.7 GB RAM

and

del l
l = [('foo', 'bar') for i in range(10000000)]
# now just 118 MB RAM

Why? 为什么? is there any obvious alternative solution that I didn't think of? 我有没有想到的任何明显的替代解决方案?

Thanks! 谢谢!

(I know, in this example the 'wrapper' class looks silly. But when the data becomes more complex and nested, it is more useful) (我知道,在这个例子中,'wrapper'类看起来很傻。但是当数据变得更复杂和嵌套时,它会更有用)

As others have said in their answers, you'll have to generate different objects for the comparison to make sense. 正如其他人在他们的答案中所说,你必须生成不同的对象才能进行比较。

So, let's compare some approaches. 那么,让我们比较一些方法。

tuple

l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB

class Person

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB

namedtuple ( tuple + __slots__ ) namedtupletuple + __slots__

from collections import namedtuple
Person = namedtuple('Person', 'first last')

l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB

namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True ). namedtuple基本上是一个扩展tuple的类,并为所有命名字段使用__slots__ ,但它添加了字段getter和一些其他帮助方法(如果使用verbose=True调用,则可以看到生成的确切代码)。

class Person + __slots__ class Person + __slots__

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB

This is a trimmed-down version of namedtuple above. 这是上面的namedtuple的精简版。 A clear winner, even better than pure tuples. 一个明显的赢家,甚至比纯元组更好。

Using __slots__ decreases the memory footprint quite a bit (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict to store the attributes. 使用__slots__减少内存占用(在我的测试中从1.7 GB到625 MB),因为每个实例不再需要持有dict来存储属性。

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

The drawback is that you can no longer add attributes to an instance after it is created; 缺点是您不能再在创建实例后向其添加属性; the class only provides memory for the attributes listed in the __slots__ attribute. 该类仅为__slots__属性中列出的属性提供内存。

There is yet another way to reduce the amount of memory occupied by objects by turning off support for cyclic garbage collection in addition to turning off __dict__ and __weakref__ . 除了关闭__dict____weakref__之外,还有另一种方法可以通过关闭对循环垃圾收集的支持来减少对象占用的内存量。 It is implemented in the library recordclass : 它在库记录类中实现

$ pip install recordclass

>>> import sys
>>> from recordclass import dataobject, make_dataclass

Create the class: 创建类:

class Person(dataobject):
   first:str
   last:str

or 要么

>>> Person = make_dataclass('Person', 'first last')

As result: 结果:

>>> print(sys.getsizeof(Person(100,100)))
32

For __slot__ based class we have: 对于基于__slot__的类,我们有:

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

>>> print(sys.getsizeof(Person(100,100)))
56

As a result more saving of memory is possible. 结果,可以更节省存储器。

For dataobject -based: 对于基于数据dataobject的:

l = [Person(i, i) for i in range(10000000)]
memory size: 681 Mb

For __slots__ -based: 对于基于__slots__的:

  l = [Person(i, i) for i in range(10000000)]
  memory size: 921 Mb

In your second example, you only create one object, because tuples are constants. 在第二个示例中,您只创建一个对象,因为元组是常量。

>>> l = [('foo', 'bar') for i in range(10000000)]
>>> id(l[0])
4330463176
>>> id(l[1])
4330463176

Classes have the overhead, that the attributes are saved in a dictionary. 类具有开销,属性保存在字典中。 Therefore namedtuples needs only half the memory. 因此,命名元组只需要一半的内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM