Recently, I had data in a users.json
file which was taking a lot of time to load in VsCode as the file was too large (surprising to me because it was a 29mb
file), I wanted to use this chance to play around with pythons' memory usage, I loaded the file all into memory and it worked as expected.
Although I have a question, more of me needing an explanation, forgive me if its' answer is too obvious;
When I made an introspection on the loaded
json
object, I found out that the object size (1.3mb
) was way less that the file size (29.6mb
) on my file system (MacOS
), how could this be? The difference in size is just too much to ignore. To make things worse, i had a smaller file and that file returned similar size results (on-disk/loaded, ~358kb
), haha.
import json
with open('users.json') as infile:
data = json.load(infile)
print(f'Object Item Count: {len(data):,} items \nObject Size: {data.__sizeof__():,} bytes)
Using sys.getsizeof(data)
would return something similar, maybe with some gc
overhead.
This returns the accurate size of the file on disk ( 29586765
bytes, 29mb
)
from pathlib import Path
Path('users.json').stat().st_size
Please can someone explain to me what is happening, one would think that there should be similarity in size or maybe i'm wrong.
sys.getsizeof()
doesn't recurse into objects:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
All of the strings, numbers, etc. that get loaded from your JSON file are those aforementioned "objects being referred to".
For a more accurate result, you could
That said, though, some objects will be smaller in memory than on disk; for instance, a large number, say, 36 << 921
is 279 bytes on disk and sys.getsizeof()
pins it at 148 bytes in memory. Similarly, a smart enough JSON decoder ( which, afaik, the built-in which the default JSON decoder actually does , see https://github.com/python/cpython/commit/7d6e076f6d8dd48cfd748b02dad17dbeb0b346a3 ) could share objects for repeating dict keys.json
is not
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.