EDIT: Now looking on how to calculate # of "looping paths" per node
As the title says, I'm trying to make a function that calculates the number of "signal paths" for any node in a network. A signal path for a node is a path from one of multiple inputs to one of multiple outputs that the node is a part of. I'm using an algorithm someone already made called all_simple_paths , which is a generator which returns every path from an input to an output.
However, even though my code looks right, I'm getting incorrect results. Here's the function:
def signal_path_counter(G, inputs, outputs, node):
c = 0
paths = []
for out in outputs:
for i in inputs:
for path in all_simple_paths(G, i, out):
paths.append(path)
for path in paths:
for n in path:
if(node == n):
c += 1
return c
Here's the input data:
import networkx as nx
import matplotlib.pyplot as plt
G=nx.DiGraph()
molecules = ["CD40L", "CD40", "NF-kB", "XBP1", "Pax5", "Bach2", "Irf4", "IL-4", "IL-4R", "STAT6", "AID", "Blimp1", "Bcl6", "ERK", "BCR", "STAT3", "Ag", "STAT5", "IL-21R", "IL-21", "IL-2", "IL-2R"]
Bcl6 = [("Bcl6", "Bcl6"), ("Bcl6", "Blimp1"), ("Bcl6", "Irf4")]
STAT5 = [("STAT5", "Bcl6")]
IL_2R = [("IL-2R", "STAT5")]
IL_2 = [("IL-22", "IL-2R")]
BCR = [("BCR", "ERK")]
Ag = [("Ag", "BCR")]
CD40L = [("CD40L", "CD40")]
CD40 = [("CD40", "NF-B")]
NF_B = [("NF-B", "Irf4"), ("NF-B", "AID")]
Irf4 = [("Irf4", "Bcl6"), ("Irf4", "Pax5"), ("Irf4", "Irf4"), ("Irf4", "Blimp1")]
ERK = [("ERK", "Bcl6"), ("ERK", "Blimp1"), ("ERK", "Pax5")]
STAT3 = [("STAT3", "Blimp1")]
IL_21 = [("IL-21", "IL-21R")]
IL_21R = [("IL-21R", "STAT3")]
IL_4R = [("IL-4R", "STAT6")]
STAT6 = [("STAT6", "AID"), ("STAT6", "Bcl6")]
Bach2 = [("Bach2", "Blimp1")]
IL_4 = [("IL-4", "IL-4R")]
Blimp1 = [("Blimp1", "Bcl6"), ("Blimp1", "Bach2"), ("Blimp1", "Pax5"), ("Blimp1", "AID"), ("Blimp1", "Irf4")]
Pax5 = [("Pax5", "Pax5"), ("Pax5", "AID"), ("Pax5", "Bcl6"), ("Pax5", "Bach2"), ("Pax5", "XBP1"), ("Pax5", "ERK"), ("Pax5", "Blimp1")]
edges = Bcl6 + STAT5 + IL_2R + IL_2 + BCR + Ag + CD40L + CD40 + NF_B + Irf4 +
ERK + STAT3 + IL_21 + IL_21R + IL_4R + STAT6 + Bach2 + IL_4 + Blimp1 + Pax5
G.add_nodes_from(molecules)
G.add_edges_from(edges)
sources = ["Ag", "CD40L", "IL-2", "IL-21", "IL-4"]
targets = ["XBP1", "AID"]
Visual representation of the input network here .
The function call that gives incorrect result of 0:
print(signal_path_counter(G, sources, targets, "IL-2R"))
Your typo is in this line:
IL_2 = [("IL-22", "IL-2R")]
It should be
IL_2 = [("IL-2", "IL-2R")]
There are some things that can be done with your code to make it more "pythonic". Iterating over multiple combinations can be done more cleanly using this approach , which would replace the loop over out
and over i
with
for input, output in itertools.product(inputs, outputs):
for path in all_simple_paths(G, input, output):
paths.append(...)
Also rather than building up paths and then looping through paths
to test if the node is in it, do the test directly rather than appending to paths
:
for input, output in itertools.product(inputs, outputs):
for path in all_simple_paths(G, input, output):
if node in path:
c += 1
Even for this code, I think it could be made cleaner using a Counter
. Basically, if you're ever doing variable += 1
, or appending elements to a list while iterating, there's often a "more pythonic" way to do it.
I am concerned about how well this algorithm will scale for larger networks. Finding all paths is expensive. It may be better to start from node
and build all paths from node
to outputs
and all paths from inputs
to node
. Then convert each path into a set [the conversion into sets makes the next step faster]. Then go through the in and out paths and see if they have any intersection. If not, then you've got a path through node
.
This would significantly reduce the number of paths you end up having to consider (and likely the length of the paths as well).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.