简体   繁体   中英

fast file loading on python

I have two problems with loading data on python, both the scipts work properly but they need too much time to run and sometimes "Killed" is the result (with the first one).

  1. I have a big zipped text file and I do something like this:

     import gzip import cPickle as pickle f = gzip.open('filename.gz','r') tab={} for line in f: #fill tab with open("data_dict.pkl","wb") as g: pickle.dump(tab,g) f.close() 
  2. I have to do some operations on the dictionary I created in the previous script

     import cPickle as pickle with open("data_dict.pkl", "rb") as f: tab = pickle.load(f) f.close() #operations on tab (the dictionary) 

Do you have other solutionsin mind? Maybe not the ones involving YAML or JSON...

If the data you are pickling is primitive and simple, you can try marshal module: http://docs.python.org/3/library/marshal.html#module-marshal . That's what Python uses to serialize its bytecode, so it's pretty fast.

First one comment, in:

with open("data_dict.pkl", "rb") as f:
        tab = pickle.load(f)
f.close()

f.close() is not necessary, the context manager ( with syntax) does that automatically.

Now as for speed, I don't think you're going to get too much faster than cPickle for the purpose of reading in something from disk directly as a Python object. If this script needs to be run over and over I would try using memchached via pylibmc to keep the object stored persistently in memory so you can access it lightning fast:

import pylibmc

mc = pylibmc.Client(["127.0.0.1"], binary=True,behaviors={"tcp_nodelay": True,"ketama": True})
d = range(10000)          ## some big object
mc["some_key"] = d        ## save in memory

Then after saving it once you can access and modify it, it stays in memory even after the previous program finishes its execution:

import pylibmc
mc = pylibmc.Client(["127.0.0.1"], binary=True,behaviors={"tcp_nodelay": True,"ketama": True})
d = mc["some_key"]        ## load from memory
d[0] = 'some other value' ## modify
mc["some_key"] = d        ## save to memory again

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM