简体   繁体   English

使用Python改进从有向图投影的无向图的创建

[英]Improve creation of undirected graph projected from a directed one using Python

I have a (bipartite) directed graph where a legal entity is connected by an edge to each candidate it sponsored or cosponsored. 我有一个(双向)有向图,其中法人实体通过一条边与它所赞助或共同赞助的每个候选人相连。 From it, I want a second (unipartite), undirected one, G , projected from the first in which nodes are candidates and the weighted edges connecting them indicate how many times they received money together from the same legal entity. 从中,我想要从第一个投影的第二个(单方),无向的G ,其中节点是候选者,连接它们的权重边表示它们从同一法人实体一起收到钱的次数。

All information are encoded in a dataframe candidate_donator where each candidate are associated to a tuple containing who donated to him. 所有信息都编码在一个数据帧的candidate_donator ,每个候选者都与一个元组相关联,该元组包含谁捐赠给他。

I'm using Networkx to create the network and want optimize my implementation because it is taking very long. 我正在使用Networkx创建网络,并希望优化我的实施,因为这花费了很长时间。 My original approach is: 我的原始方法是:

candidate_donator = df.groupby('candidate').agg({'donator': lambda x: tuple(set(x))})

import itertools
candidate_pairs= list(itertools.combinations(candidate_donator .index, 2)) #creating all possible unique combinations of pair candidates ~83 M

for cpf1, cpf2 in candidate_pairs:
    donators_filter = list(filter(set(candidate_pairs.loc[cpf1]).__contains__, candidate_pairs.loc[cpf2]))
    G.add_edge(cpf1, cpf2, weight = len(donators_filter ))      

Try this: 尝试这个:

#list of donators per candidate
candidate_donator = df.groupby('candidate').agg({'donator': lambda x: tuple(set(x))})
#list of candidates per donator
donator_candidate = df.groupby('donator').agg({'candidate': lambda x: tuple(set(x))})

#for each candidate
for candidate_idx in candidate_donator.index:
    #for each donator connected to this candidate
    for donator_list in candidate_donator.loc[candidate_idx, 'donator']:
        for last_candidate in donator_list:
            #existing edge, add weight
            if G.has_edge(candidate_idx, last_candidate):
                G[candidate_idx][last_candidate] += 0.5
            #non existing edge, weight = 0.5 (every edge will be added twice)
            else:
                G.add_edge(candidate_idx, last_candidate, weight = 0.5)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM