简体   繁体   中英

Python 1500% Memory Overhead from loading binary file

I'm trying to load one binary file containing 120mb of data using the following routine:

from struct import *
def loadBinaryData():
    d = []
    size = calcsize('iif')
    with open("ratings.bin",'rb') as f:
        while True:
            data = f.read(size)
            if not data: break
            (a,b,c) = unpack_from('iif',data)   
            d.append(((a,b),c))
    return d    

However, when I execute this, my python process starts to use up to 2.2gb of RAM, which - to me - feels very wrong. Are there any obvious errors that can explain this behavior? Am I misusing any very wasteful python features?

One more thing, I don't wanna use generator functions for this, I actually need all of the data in memory.

I would try converting more data at once, to cut down on per-object memory overhead. Your data seems to be a 3-tuple of around 12 bytes, so for a 120-MB file you have around 10 million tuples.

If you look at this article , you can see that:

  • An integer has an overhead of 24 bytes.
  • A float does, too.
  • A tuple has 63 bytes of overhead.

10 million tuples at (2 * 24 + 24 + 63) bytes each is going to weigh in at 1.35 GB, but perhaps there is additional garbage created by the growing of the d list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM