在python中使用多处理来划分for循环

Question

I want to create a graph and since there are 2600 nodes and I have to iterate through each node to create an edge, my for loop is running for almost 6 million times. 我想创建一个图，由于有2600个节点，并且必须遍历每个节点才能创建一条边，所以我的for循环运行了将近600万次。 Therefore, I am trying to do multiprocessing to make this faster. 因此，我正在尝试进行多处理以使其更快。 Now what I want, is to create 20 processes and divide the 6 million lines between them. 现在，我要创建20个流程，并在它们之间划分600万行。 I wrote this following code, but it doesn't work. 我编写了以下代码，但是它不起作用。

from igraph import *
from multiprocessing import Process    

def generate_edges(call_record_dict):
    for index, key in enumerate(call_record_dict):
        for index2, key2 in enumerate(call_record_dict):
            if(key!=key2):
                if(call_record_dict[key]==call_record_dict[key2]):
                    if(g.are_connected(index, index2) is False):
                        g.add_edges((index,index2))

def generate_graph(call_record_dict):
    g=Graph()
    g.add_vertices(len(call_record_dict))
    for i in range(20):
        p = Process(target=generate_edges, args=(call_record_dict))
        p.start()

I've tried using Pool. 我尝试使用Pool。

pool = Pool(processes=20)
pool.map(generate_edges,call_record_dict)
pool.close()
pool.join()

This doesn't solve the problem either. 这也不能解决问题。

Answer 1

Try this: 尝试这个：

from igraph import *
import  multiprocessing as mp

def generate_edges(call_record_dict):
    for index, key in enumerate(call_record_dict):
        for index2, key2 in enumerate(call_record_dict):
            if(key!=key2):
                if(call_record_dict[key]==call_record_dict[key2]):
                    return (index, index2)



def generate_graph(call_record_dict):
    g=Graph()
    g.add_vertices(len(call_record_dict))
    pool = mp.Pool(4)
    for index, ret in pool.map(generate_edges, call_record_dict):
        # ret = (index, index2)
        if(g.are_connected(ret) is False):
            g.add_edges((ret))

I dont have igraph and never used it so sorry this is not fully tested 我没有igraph，也从未使用过，很抱歉，此功能尚未经过全面测试

Answer 2

If the values in the call_record_dict are hashable and are unique for each connection, you might try a different approach. 如果call_record_dict中的值是可哈希化的，并且对于每个连接都是唯一的，则可以尝试使用其他方法。

from collections import defaultdict
import itertools as it

matches = defaultdict(list)

for index, value in enumerate(call_record_dict.values()):
    matches[value].append(index)

for values in matches.values():
    for index1, index2 in it.combinations(values, 2):
        g.add_edge((index1, index2))
        g.add_edge((index2, index1))

Run time should be O(n), where n in the length of the dictionary, rather than O(n^2). 运行时间应为O（n），其中字典长度为n，而不是O（n ^ 2）。

在python中使用多处理来划分for循环

问题描述

2 个解决方案

解决方案1
0 2016-02-24 20:05:23

解决方案2
0 已采纳 2016-02-25 04:09:46

在python中使用多处理来划分for循环

问题描述

2 个解决方案

解决方案1 0 2016-02-24 20:05:23

解决方案2 0 已采纳 2016-02-25 04:09:46

解决方案1
0 2016-02-24 20:05:23

解决方案2
0 已采纳 2016-02-25 04:09:46