在python中多处理字典

Question

I have two dictionaries of data and I created a function that acts as a rules engine to analyze entries in each dictionaries and does things based on specific metrics I set(if it helps, each entry in the dictionary is a node in a graph and if rules match I create edges between them). 我有两个数据字典，我创建了一个函数，作为规则引擎来分析每个字典中的条目，并根据我设置的特定指标做事情（如果有帮助，字典中的每个条目都是图形中的节点，如果规则匹配我在它们之间创建边缘）。

Here's the code I use(its a for loop that passes on parts of the dictionary to a rules function. I refactored my code to a tutorial I read): 这是我使用的代码（它是一个for循环，它将部分字典传递给规则函数。我将我的代码重构为我读过的教程）：

jobs = []
    def loadGraph(dayCurrent, day2Previous):
        for dayCurrentCount  in graph[dayCurrent]:
            dayCurrentValue = graph[dayCurrent][dayCurrentCount]
            for day1Count  in graph[day2Previous]:
                day1Value = graph[day2Previous][day1Count]
                #rules(day1Count, day1Value, dayCurrentCount, dayCurrentValue, dayCurrent, day2Previous)
            p = multiprocessing.Process(target=rules, args=(day1Count, day1Value, dayCurrentCount, dayCurrentValue, dayCurrent, day2Previous))
            jobs.append(p)
            p.start()
            print ' in rules engine for day', dayCurrentCount, ' and we are about ', ((len(graph[dayCurrent])-dayCurrentCount)/float(len(graph[dayCurrent])))

The data I'm studying could be rather large(could, because its randomly generated). 我正在研究的数据可能相当大（可能，因为它是随机生成的）。 I think for each day there's about 50,000 entries. 我想每天约有50,000个参赛作品。 Because most of the time is spend on this stage, I was wondering if I could use the 8 cores I have available to help process this faster. 因为大多数时间花在这个阶段，我想知道我是否可以使用我可用的8个核心来帮助更快地处理这个问题。

Because each dictionary entry is being compared to a dictionary entry from the day before, I thought the proceses could be split up by that but my above code is slower than using it normally. 因为每个字典条目都与前一天的字典条目进行比较，所以我认为这些过程可能会被分开但我上面的代码比正常使用它慢。 I think this is because its creating a new process for every entry its doing. 我认为这是因为它为每个条目创建了一个新流程。

Is there a way to speed this up and use all my cpus? 有没有办法加快速度并使用我所有的cpu？ My problem is, I don't want to pass the entire dictionary because then one core will get suck processing it, I would rather have a the process split to each cpu or in a way that I maximum all free cpus for this. 我的问题是，我不想传递整个字典，因为那时一个核心会吮吸处理它，我宁愿让一个进程拆分到每个cpu，或者以最大化所有自由cpus的方式。

I'm totally new to multiprocessing so I'm sure there's something easy I'm missing. 我对多处理完全陌生，所以我确信有一些我很容易丢失的东西。 Any advice/suggestions or reading material would be great! 任何建议/建议或阅读材料都会很棒！

Answer 1

What I've done in the past is to create a "worker class" that processes data entries. 我过去所做的是创建一个处理数据条目的“工人类”。 Then I'll spin up X number of threads that each run a copy of the worker class. 然后我将启动X个线程，每个线程运行一个worker类的副本。 Each item in the dataset gets pushed into a queue that the worker threads are watching. 数据集中的每个项目都被推送到工作线程正在观察的队列中。 When there are no more items in the queue, the threads spin down. 当队列中没有其他项时，线程会降低。

Using this method, I was able to process 10,000+ data items using 5 threads in about 3 seconds. 使用这种方法，我能够在大约3秒内使用5个线程处理10,000多个数据项。 When the app was only single-threaded, this would take significantly longer. 当应用程序只是单线程时，这将花费更长的时间。

Check out: http://docs.python.org/library/queue.html 查看： http ： //docs.python.org/library/queue.html

Answer 2

I would recommend looking into MapReduce implementations in Python. 我建议在Python中查看MapReduce实现。 Here's one: http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=mapreduce+python . 这是一个： http ： //www.google.com/search？sourceid = chrome＆ie = UTF-8＆q = mapreduce + python 。 Also, take a look at a python package called Celery: http://celeryproject.org/ . 另外，看一下名为Celery的python包： http ： //celeryproject.org/ 。 With celery you can distribute your computation not only among cores on a single machine, but also to a server farm (cluster). 使用celery，您不仅可以在单台计算机上的核心之间分配计算，还可以将服务器场（集群）分配给计算机。 You do pay for that flexibility with more involved setup/maintenance. 您通过更多涉及的设置/维护来支付这种灵活性。

在python中多处理字典

问题描述

2 个解决方案

解决方案1
2 已采纳 2011-10-19 22:26:12

解决方案2
0 2011-10-19 21:37:50

在python中多处理字典

问题描述

2 个解决方案

解决方案1 2 已采纳 2011-10-19 22:26:12

解决方案2 0 2011-10-19 21:37:50

解决方案1
2 已采纳 2011-10-19 22:26:12

解决方案2
0 2011-10-19 21:37:50