简体   繁体   中英

python: finding input to output paths in networkX

EDIT: Now looking on how to calculate # of "looping paths" per node

As the title says, I'm trying to make a function that calculates the number of "signal paths" for any node in a network. A signal path for a node is a path from one of multiple inputs to one of multiple outputs that the node is a part of. I'm using an algorithm someone already made called all_simple_paths , which is a generator which returns every path from an input to an output.

However, even though my code looks right, I'm getting incorrect results. Here's the function:

def signal_path_counter(G, inputs, outputs, node):
    c = 0
    paths = []
    for out in outputs:
        for i in inputs:
            for path in all_simple_paths(G, i, out):
                paths.append(path)
    for path in paths:
        for n in path:
            if(node == n):
                c += 1
    return c

Here's the input data:

import networkx as nx
import matplotlib.pyplot as plt
G=nx.DiGraph()
molecules = ["CD40L", "CD40", "NF-kB", "XBP1", "Pax5", "Bach2", "Irf4", "IL-4", "IL-4R", "STAT6", "AID", "Blimp1", "Bcl6", "ERK", "BCR", "STAT3", "Ag", "STAT5", "IL-21R", "IL-21", "IL-2", "IL-2R"]
Bcl6 = [("Bcl6", "Bcl6"), ("Bcl6", "Blimp1"), ("Bcl6", "Irf4")]
STAT5 = [("STAT5", "Bcl6")]
IL_2R = [("IL-2R", "STAT5")]
IL_2 = [("IL-22", "IL-2R")]
BCR = [("BCR", "ERK")]
Ag = [("Ag", "BCR")]
CD40L = [("CD40L", "CD40")]
CD40 = [("CD40", "NF-B")]
NF_B = [("NF-B", "Irf4"), ("NF-B", "AID")]
Irf4 = [("Irf4", "Bcl6"), ("Irf4", "Pax5"), ("Irf4", "Irf4"), ("Irf4", "Blimp1")]
ERK = [("ERK", "Bcl6"), ("ERK", "Blimp1"), ("ERK", "Pax5")]
STAT3 = [("STAT3", "Blimp1")]
IL_21 = [("IL-21", "IL-21R")]
IL_21R = [("IL-21R", "STAT3")]
IL_4R = [("IL-4R", "STAT6")]
STAT6 = [("STAT6", "AID"), ("STAT6", "Bcl6")]
Bach2 = [("Bach2", "Blimp1")]
IL_4 = [("IL-4", "IL-4R")]
Blimp1 = [("Blimp1", "Bcl6"), ("Blimp1", "Bach2"), ("Blimp1", "Pax5"), ("Blimp1", "AID"), ("Blimp1", "Irf4")]
Pax5 = [("Pax5", "Pax5"), ("Pax5", "AID"), ("Pax5", "Bcl6"), ("Pax5", "Bach2"), ("Pax5", "XBP1"), ("Pax5", "ERK"), ("Pax5", "Blimp1")]
edges = Bcl6 + STAT5 + IL_2R + IL_2 + BCR + Ag + CD40L + CD40 + NF_B + Irf4 + 
ERK + STAT3 + IL_21 + IL_21R + IL_4R + STAT6 + Bach2 + IL_4 + Blimp1 + Pax5
G.add_nodes_from(molecules)
G.add_edges_from(edges)
sources = ["Ag", "CD40L", "IL-2", "IL-21", "IL-4"]
targets = ["XBP1", "AID"]

Visual representation of the input network here .

The function call that gives incorrect result of 0:

print(signal_path_counter(G, sources, targets, "IL-2R"))

Your typo is in this line:

IL_2 = [("IL-22", "IL-2R")]

It should be

IL_2 = [("IL-2", "IL-2R")]

There are some things that can be done with your code to make it more "pythonic". Iterating over multiple combinations can be done more cleanly using this approach , which would replace the loop over out and over i with

for input, output in itertools.product(inputs, outputs):
    for path in all_simple_paths(G, input, output):
        paths.append(...)

Also rather than building up paths and then looping through paths to test if the node is in it, do the test directly rather than appending to paths :

for input, output in itertools.product(inputs, outputs):
    for path in all_simple_paths(G, input, output):
        if node in path:
            c += 1

Even for this code, I think it could be made cleaner using a Counter . Basically, if you're ever doing variable += 1 , or appending elements to a list while iterating, there's often a "more pythonic" way to do it.

I am concerned about how well this algorithm will scale for larger networks. Finding all paths is expensive. It may be better to start from node and build all paths from node to outputs and all paths from inputs to node . Then convert each path into a set [the conversion into sets makes the next step faster]. Then go through the in and out paths and see if they have any intersection. If not, then you've got a path through node .

This would significantly reduce the number of paths you end up having to consider (and likely the length of the paths as well).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM