如何使用多處理並行使用 python gneerator？

Question

如何提高 networkx function local_bridges https://networkx.org/documentation/stable//reference/algorithms/generated/networkx.algorithms.bridges.local_bridges.html.networks.algorithms.

我已經嘗試過使用 pypy - 但到目前為止，我仍然堅持在單核上使用生成器。 我的圖有 300k 邊。 一個例子：

# construct the nx Graph:
import networkx as nx
# construct an undirected graph here - this is just a dummy graph
G = nx.cycle_graph(300000)

# fast - as it only returns an generator/iterator
lb = nx.local_bridges(G)

# individual item is also fast
%%time
next(lb)
CPU times: user 1.01 s, sys: 11 ms, total: 1.02 s
Wall time: 1.02 s

# computing all the values is very slow.
lb_list = list(lb)

如何並行使用此迭代器以利用所有處理器內核？ 當前的幼稚實現僅使用單核！

我天真的多線程第一次嘗試是：

import multiprocessing as mp
lb = nx.local_bridges(G)
pool = mp.Pool()
lb_list = list(pool.map((), lb))

但是，我不想應用特定的 function - ()而只是從迭代器中並行獲取next元素。

相關： python 或 dask 並行發生器？

編輯

我想它歸結為如何並行化：

lb_res = []
lb = nx.local_bridges(G)
for node in range(1, len(G) +1):
    lb_res.append(next(lb))
    
lb_res

天真地使用多處理顯然失敗了：

# from multiprocessing import Pool
# https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
from multiprocess import Pool
lb_res = []
lb = nx.local_bridges(G)

def my_function(thing):
    return next(thing)

with Pool(5) as p:
    parallel_result = p.map(my_function, range(1, len(G) +1))
    
parallel_result

但我不清楚如何將生成器作為參數傳遞給 map function - 並完全消耗生成器。

編輯 2

對於這個特定問題，事實證明瓶頸是with_span=True參數的最短路徑計算。 禁用時，它的速度相當快。

當需要計算跨度時，我建議在cugraph上快速實現 SSSP 的 cugraph。 盡管如此，對邊集的迭代不會並行發生，應該進一步改進。

但是，要了解更多信息，我有興趣了解如何並行化 python 中生成器的消耗。

Answer 1

您不能並行使用生成器，每個非平凡生成器的下一個 state 由其當前 state 確定。 您必須按順序調用next() 。

從https://github.com/networkx/networkx/blob/master/networkx/algorithms/bridges.py#L162這就是 function 的實現方式

for u, v in G.edges:
    if not (set(G[u]) & set(G[v])):
        yield u, v

因此，您可以使用類似這樣的東西將其並行化，但是您將不得不承擔使用multiprocessing.Manager之類的東西合並這些單獨列表的懲罰。 我認為這只會讓整個事情變得更慢，但你可以自己計時。

def process_edge(e):
    u, v = e
    lb_list = []
    if not (set(G[u]) & set(G[v])):
        lb_list.append((u,v))
with Pool(os.cpu_count()) as pool:
    pool.map(process_edge, G.edges)

另一種方法是將圖形拆分為頂點范圍並同時處理它們。

def process_nodes(nodes):
    lb_list = []
    for u in nodes:
        for v in G[u]:
            if not (set(G[u]) & set(G[v])):
                lb_list.append((u,v))

with Pool(os.cpu_count()) as pool:
    pool.map(process_nodes, np.array_split(list(range(G.number_of_nodes())), 
os.cpu_count()))

也許您還可以檢查是否存在針對此問題的更好算法。 或者找到在 C 中實現的更快的庫。

如何使用多處理並行使用 python gneerator？

問題描述

編輯

編輯 2

1 個解決方案

解決方案1
1 已采納 2021-02-21 12:44:10

如何使用多處理並行使用 python gneerator？

問題描述

編輯

編輯 2

1 個解決方案

解決方案1 1 已采納 2021-02-21 12:44:10

解決方案1
1 已采納 2021-02-21 12:44:10