[英]python: how to create persistent in-memory structure for debugging
[Python 3.1] [Python 3.1]
My program takes a long time to run just because of the pickle.load
method on a huge data structure. 由于大型数据结构上的pickle.load
方法,我的程序需要很长时间才能运行。 This makes debugging very annoying and time-consuming: every time I make a small change, I need to wait for a few minutes to see if the regression tests passed. 这使得调试非常烦人且耗时:每次我做一个小改动时,我都需要等待几分钟才能看出回归测试是否通过。
I would like replace pickle
with an in-memory data structure. 我想用内存数据结构替换pickle
。
I thought of starting a python program in one process, and connecting to it from another; 我想过在一个进程中启动一个python程序,并从另一个进程连接它; but I am afraid the inter-process communication overhead will be huge. 但我担心进程间通信开销会很大。
Perhaps I could run a python function from the interpreter to load the structure in memory. 也许我可以从解释器运行python函数来将结构加载到内存中。 Then as I modify the rest of the program, I can run it many times (without exiting the interpreter in between). 然后当我修改程序的其余部分时,我可以多次运行它(不需要退出中间的解释器)。 This seems like it would work, but I'm not sure if I will suffer any overhead or other problems. 这似乎可行,但我不确定我是否会遇到任何开销或其他问题。
您可以使用mmap
在多个进程中打开同一文件的视图,并在加载文件后以几乎内存的速度访问。
First you can pickle different parts of the hole object using this method: 首先,您可以使用此方法腌制孔对象的不同部分:
# gen_objects.py
import random
import pickle
class BigBadObject(object):
def __init__(self):
self.a_dictionary={}
for x in xrange(random.randint(1, 1000)):
self.a_dictionary[random.randint(1,98675676)]=random.random()
self.a_list=[]
for x in xrange(random.randint(1000, 10000)):
self.a_list.append(random.random())
self.a_string=''.join([chr(random.randint(65, 90))
for x in xrange(random.randint(100, 10000))])
if __name__=="__main__":
output=open('lotsa_objects.pickled', 'wb')
for i in xrange(10000):
pickle.dump(BigBadObject(), output, pickle.HIGHEST_PROTOCOL)
output.close()
Once you generated the BigFile in various separate parts you can read it with a python program with several running at the same time reading each one different parts. 一旦你在各个单独的部分生成BigFile,你可以用python程序读取它,其中几个同时运行读取每个不同的部分。
# reader.py
from threading import Thread
from Queue import Queue, Empty
import cPickle as pickle
import time
import operator
from gen_objects import BigBadObject
class Reader(Thread):
def __init__(self, filename, q):
Thread.__init__(self, target=None)
self._file=open(filename, 'rb')
self._queue=q
def run(self):
while True:
try:
one_object=pickle.load(self._file)
except EOFError:
break
self._queue.put(one_object)
class uncached(object):
def __init__(self, filename, queue_size=100):
self._my_queue=Queue(maxsize=queue_size)
self._my_reader=Reader(filename, self._my_queue)
self._my_reader.start()
def __iter__(self):
while True:
if not self._my_reader.is_alive():
break
# Loop until we get something or the thread is done processing.
try:
print "Getting from the queue. Queue size=", self._my_queue.qsize()
o=self._my_queue.get(True, timeout=0.1) # Block for 0.1 seconds
yield o
except Empty:
pass
return
# Compute an average of all the numbers in a_lists, just for show.
list_avg=0.0
list_count=0
for x in uncached('lotsa_objects.pickled'):
list_avg+=reduce(operator.add, x.a_list)
list_count+=len(x.a_list)
print "Average: ", list_avg/list_count
This way of reading the pickle file will take 1% of the time it takes in the other way. 这种读取pickle文件的方式将占用其他方式所需时间的1%。 This is because you are running 100 parallel threads at the same time. 这是因为您同时运行100个并行线程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.