简体   繁体   中英

Python/JSON Loaded Object Size Varies from File Size

Recently, I had data in a users.json file which was taking a lot of time to load in VsCode as the file was too large (surprising to me because it was a 29mb file), I wanted to use this chance to play around with pythons' memory usage, I loaded the file all into memory and it worked as expected.

Although I have a question, more of me needing an explanation, forgive me if its' answer is too obvious;

When I made an introspection on the loaded json object, I found out that the object size ( 1.3mb ) was way less that the file size ( 29.6mb ) on my file system ( MacOS ), how could this be? The difference in size is just too much to ignore. To make things worse, i had a smaller file and that file returned similar size results (on-disk/loaded, ~ 358kb ), haha.

import json

with open('users.json') as infile:
    data = json.load(infile)
    print(f'Object Item Count: {len(data):,} items \nObject Size: {data.__sizeof__():,} bytes)

Using sys.getsizeof(data) would return something similar, maybe with some gc overhead.

This returns the accurate size of the file on disk ( 29586765 bytes, 29mb )

from pathlib import Path

Path('users.json').stat().st_size

Please can someone explain to me what is happening, one would think that there should be similarity in size or maybe i'm wrong.

sys.getsizeof() doesn't recurse into objects:

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

All of the strings, numbers, etc. that get loaded from your JSON file are those aforementioned "objects being referred to".

For a more accurate result, you could

That said, though, some objects will be smaller in memory than on disk; for instance, a large number, say, 36 << 921 is 279 bytes on disk and sys.getsizeof() pins it at 148 bytes in memory. Similarly, a smart enough JSON decoder ( which, afaik, the built-in json is notwhich the default JSON decoder actually does , see https://github.com/python/cpython/commit/7d6e076f6d8dd48cfd748b02dad17dbeb0b346a3 ) could share objects for repeating dict keys.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM