python：如何为调试创建持久的内存结构

Question

[Python 3.1] [Python 3.1]

My program takes a long time to run just because of the pickle.load method on a huge data structure. 由于大型数据结构上的pickle.load方法，我的程序需要很长时间才能运行。 This makes debugging very annoying and time-consuming: every time I make a small change, I need to wait for a few minutes to see if the regression tests passed. 这使得调试非常烦人且耗时：每次我做一个小改动时，我都需要等待几分钟才能看出回归测试是否通过。

I would like replace pickle with an in-memory data structure. 我想用内存数据结构替换pickle 。

I thought of starting a python program in one process, and connecting to it from another; 我想过在一个进程中启动一个python程序，并从另一个进程连接它; but I am afraid the inter-process communication overhead will be huge. 但我担心进程间通信开销会很大。

Perhaps I could run a python function from the interpreter to load the structure in memory. 也许我可以从解释器运行python函数来将结构加载到内存中。 Then as I modify the rest of the program, I can run it many times (without exiting the interpreter in between). 然后当我修改程序的其余部分时，我可以多次运行它（不需要退出中间的解释器）。 This seems like it would work, but I'm not sure if I will suffer any overhead or other problems. 这似乎可行，但我不确定我是否会遇到任何开销或其他问题。

Answer 1

您可以使用mmap在多个进程中打开同一文件的视图，并在加载文件后以几乎内存的速度访问。

Answer 2

First you can pickle different parts of the hole object using this method: 首先，您可以使用此方法腌制孔对象的不同部分：

# gen_objects.py

import random
import pickle

class BigBadObject(object):
   def __init__(self):
      self.a_dictionary={}
      for x in xrange(random.randint(1, 1000)):
         self.a_dictionary[random.randint(1,98675676)]=random.random()
      self.a_list=[]
      for x in xrange(random.randint(1000, 10000)):
         self.a_list.append(random.random())
      self.a_string=''.join([chr(random.randint(65, 90)) 
                        for x in xrange(random.randint(100, 10000))])

if __name__=="__main__":
   output=open('lotsa_objects.pickled', 'wb')
   for i in xrange(10000):
      pickle.dump(BigBadObject(), output, pickle.HIGHEST_PROTOCOL)
   output.close()

Once you generated the BigFile in various separate parts you can read it with a python program with several running at the same time reading each one different parts. 一旦你在各个单独的部分生成BigFile，你可以用python程序读取它，其中几个同时运行读取每个不同的部分。

# reader.py

from threading import Thread
from Queue import Queue, Empty
import cPickle as pickle
import time
import operator

from gen_objects import BigBadObject

class Reader(Thread):
   def __init__(self, filename, q):
      Thread.__init__(self, target=None)
      self._file=open(filename, 'rb')
      self._queue=q
   def run(self):
      while True:
         try:
            one_object=pickle.load(self._file)
         except EOFError:
            break
         self._queue.put(one_object)

class uncached(object):
   def __init__(self, filename, queue_size=100):
      self._my_queue=Queue(maxsize=queue_size)
      self._my_reader=Reader(filename, self._my_queue)
      self._my_reader.start()
   def __iter__(self):
      while True:
         if not self._my_reader.is_alive():
            break
         # Loop until we get something or the thread is done processing.
         try:
            print "Getting from the queue. Queue size=", self._my_queue.qsize()
            o=self._my_queue.get(True, timeout=0.1) # Block for 0.1 seconds 
            yield o
         except Empty:
            pass
      return

# Compute an average of all the numbers in a_lists, just for show.
list_avg=0.0
list_count=0

for x in uncached('lotsa_objects.pickled'):
   list_avg+=reduce(operator.add, x.a_list)
   list_count+=len(x.a_list)

print "Average: ", list_avg/list_count

This way of reading the pickle file will take 1% of the time it takes in the other way. 这种读取pickle文件的方式将占用其他方式所需时间的1％。 This is because you are running 100 parallel threads at the same time. 这是因为您同时运行100个并行线程。

python：如何为调试创建持久的内存结构

问题描述

2 个解决方案

解决方案1
1 已采纳 2010-11-16 00:20:33

解决方案2
0 2010-11-16 02:07:45

python：如何为调试创建持久的内存结构

问题描述

2 个解决方案

解决方案1 1 已采纳 2010-11-16 00:20:33

解决方案2 0 2010-11-16 02:07:45

解决方案1
1 已采纳 2010-11-16 00:20:33

解决方案2
0 2010-11-16 02:07:45