简体   繁体   English

有效地为每个节点 igraph 提取权重最大的边

[英]Efficiently extract edges with biggest weight for each node igraph

I have a weighted undirected simple graph.我有一个加权无向简单图。
The number of nodes is around the tens of thousands and so does the number of edges (it's a sparse graph / matrix).节点数约为数万,边数也是如此(这是一个稀疏图/矩阵)。

For each node, I want to find the maximum edge weight ("max score") and store it in a data frame along with the nodes who share that edge.对于每个节点,我想找到最大边权重(“最大分数”)并将其与共享该边的节点一起存储在数据框中。 The data frame will have three columns: node_name - str, max_score - float [0-1], max_score_nodes - List[str]数据框将包含三列: node_name - str, max_score - float [0-1], max_score_nodes - List[str]

My current solution is provided below, but it's not elegant, there are multiple list comprehensions (one of them is nested) inside a for loop, checking for nodes without edges etc. and I feel there's a smarter way of doing this.下面提供了我当前的解决方案,但它并不优雅,在 for 循环中有多个列表推导(其中一个是嵌套的),检查没有边的节点等,我觉得有一种更聪明的方法可以做到这一点。

import string
import igraph as ig
import numpy as np
import pandas as pd

nodes = list(string.ascii_letters[0:6].upper())
edges = [("A", "F"), ("A", "C"), ("F", "D"), ("D", "C"), ("D", "E")]
weights = [0.6, 0.4, 0.3, 0.9, 0.9]

w_graph = ig.Graph(directed=False)
w_graph.add_vertices(nodes)
w_graph.add_edges(edges, {"weight": weights})

records = {}
for node in nodes:
    local_edges = np.array(w_graph.vs.find(node).all_edges())
    if local_edges.size == 0:
        records.update({node: {"max_score": 0, "max_score_nodes": np.nan}})
        continue
    local_weights = [local_edge["weight"] for local_edge in local_edges]

    max_score = np.max(local_weights)
    max_score_ind = np.where(local_weights == max_score)[0]
    max_score_edges = local_edges[max_score_ind]

    vertex_tuples = [edge.vertex_tuple for edge in max_score_edges]
    max_score_nodes = [
        [vertex["name"] for vertex in vertex_tuple if vertex["name"] != node][0] for
        vertex_tuple in vertex_tuples]

    records.update({node: {"max_score": max_score, "max_score_nodes": max_score_nodes}})

output = pd.DataFrame.from_dict(records, orient="index")
output_with_node_name = output.rename_axis("node_name").reset_index()
print(output_with_node_name)

I discovered SciPy's csr matrix which efficiently handles sparse matrix in a compressed row format.我发现了 SciPy 的csr 矩阵,它可以有效地处理压缩行格式的稀疏矩阵。

There's an igraph method which returns the graph's adjacency matrix as a CSR matrix.有一个igraph 方法将图形的邻接矩阵作为 CSR 矩阵返回。

From here we can utilize vectorized operations on rows, so the whole nested for loop is now reduced into 4 lines:从这里我们可以对行使用向量化操作,因此整个嵌套的 for 循环现在减少为 4 行:

scores_matrix = w_graph.get_adjacency_sparse(attribute="weight")

node_names = w_graph.vs["name"]
max_scores = scores_matrix.max(axis=0).toarray()[0]
max_score_nodes = w_graph.vs[scores_matrix.argmax(axis=0).tolist()[0]]["name"]
max_score_nodes = np.where(max_scores == 0, np.nan, max_score_nodes)

output = pd.DataFrame({"node_name": node_names,
                       "max_score": max_scores,
                       "max_score_nodes": max_score_nodes})
print(output)

The problem is that argmax will return only a single index, if you want to have all node_names with max_score you can loop like this:问题是argmax只会返回一个索引,如果你想让所有 node_names 都带有max_score ,你可以像这样循环:

from scipy.sparse import find

max_score_nodes = []
for node_index in w_graph.vs.indices:
    max_score = max_scores[node_index]
    if max_score == 0:
        max_score_nodes.append(np.nan)
        continue

    max_score_node_ind = find(scores_matrix.getrow(node_index) == max_score)[1]
    max_score_node_names = w_graph.vs[max_score_node_ind.tolist()]["name"]
    max_score_nodes.append(max_score_node_names)

output["max_score_nodes"] = max_score_nodes
print(output)

You can build your graph like this:您可以像这样构建图表:

import igraph as ig
from igraph import Graph
g = Graph.Formula('A-F, A-C, F-D-C, D-E')
g.es['weight'] = [0.6, 0.4, 0.3, 0.9, 0.9]

Get the index of the highest weight edge for each vertex:获取每个顶点的最高权重边的索引:

strongest_edges = [max(edges, key=g.es['weight'].__getitem__) for edges in g.get_inclist()]

Get the endpoints of each of these edges:获取每个边的端点:

[g.es[eid].tuple for eid in strongest_edges]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM