Python 1500% Memory Overhead from loading binary file

Question

I'm trying to load one binary file containing 120mb of data using the following routine:

from struct import *
def loadBinaryData():
    d = []
    size = calcsize('iif')
    with open("ratings.bin",'rb') as f:
        while True:
            data = f.read(size)
            if not data: break
            (a,b,c) = unpack_from('iif',data)   
            d.append(((a,b),c))
    return d

However, when I execute this, my python process starts to use up to 2.2gb of RAM, which - to me - feels very wrong. Are there any obvious errors that can explain this behavior? Am I misusing any very wasteful python features?

One more thing, I don't wanna use generator functions for this, I actually need all of the data in memory.

Answer 1

I would try converting more data at once, to cut down on per-object memory overhead. Your data seems to be a 3-tuple of around 12 bytes, so for a 120-MB file you have around 10 million tuples.

If you look at this article , you can see that:

An integer has an overhead of 24 bytes.
A float does, too.
A tuple has 63 bytes of overhead.

10 million tuples at (2 * 24 + 24 + 63) bytes each is going to weigh in at 1.35 GB, but perhaps there is additional garbage created by the growing of the d list.

Python 1500% Memory Overhead from loading binary file

Question

1 answers

solution1
2 2014-10-28 15:20:13

Python 1500% Memory Overhead from loading binary file

Question

1 answers

solution1 2 2014-10-28 15:20:13

solution1
2 2014-10-28 15:20:13