简体   繁体   中英

How to load 2 big dict in Python?

I am a newbie in Python. I got a memory error when I load two dicts from two files. These two files are

with open(filename, 'rb') as f:
  hashtable_album = {}
  for line in f:
    # print i
    p = 0
    q = line.find("####")
    # print p
    # print q
    itembuf = line[p:q]
    # print itembuf
    dictbuf = line[q + 4:-1]
    # print line
    a = json.loads(dictbuf)
    # print a
    # print type(a)
    hashtable_album[itembuf] = a
f.close()
with open(filename2, 'rb') as f2:
  hashtable_item={}
  i=0
  for line in f:
    print len(dic)
    print i
    #print line
    p = 0
    q = line.find("####")
    # print p
    # print q
    itembuf = line[p:q]
    # print itembuf
    dictbuf = line[q + 4:-1]
    # print line
    a = json.loads(dictbuf)
    #print a
    # print type(a)
    hashtable_item[itembuf] = a
    i=i+1
f2.close()

the first file is about 400MB and it is bigger then the second one which is about 200MB, and I can load the first file successfully.But when I load the second file I got memory error as

  Traceback (most recent call last):
  File "E:/py_workspace/1.0_memory_error.py", line 44, in <module>
    hashtable_item[itembuf] = a
  MemoryError

If I change the order to load the file as read file2 firstly and file1 follow, there is also a memory error when I load the second file. I guess that the memory error comes from the dict so I clear the dict after I load file1 as

hashtable_album = {}

and go on loading file2. And this time it works with no memory error. But I need to use these 2 dicts at the same time. So how can I load them together?

tips: I tried the cPickle to save the dict but it cant work and I get the memory error either.

You are probably running 32-bit python.

Verify that by

$ python -c "import sys; print sys.maxint" // 64-bit python
9223372036854775807

$ python-32 -c "import sys; print sys.maxint" // 32-bit
2147483647

If you can not do away from running in 32-bit space then you have two options

  1. Learn C, and do the processing with C. From the input size of the file it may be possible that a strict use of memory (mallocs/callocs) could allow you to hold everything in mem.
  2. If the algorithm you have allows map-reduce, then it is probably faster to learn map-reduce, and do partial file processing in each step, and then combine the result in the final step.
  3. Not sure, but you might want to give Cython a try.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM