简体   繁体   中英

conversion newick to graphml using python

I would like to convert a tree from newick to a format like graphml, that I can open with cytoscape.

So, I have a file "small.newick" that contain:

((raccoon:1,bear:6):0.8,((sea_lion:11.9, seal:12):7,((monkey:100,cat:47):20, weasel:18):2):3,dog:25);

So far, I did that way (Python 3.6.5 |Anaconda):

from Bio import Phylo
import networkx
Tree = Phylo.read("small.newick", 'newick')
G = Phylo.to_networkx(Tree)
networkx.write_graphml(G, 'small.graphml')

图片1

There is a problem with the Clade, that I can fix using this code:

from Bio import Phylo
import networkx

def clade_names_fix(tree):
    for idx, clade in enumerate(tree.find_clades()):
        if not clade.name:
            clade.name=str(idx)

Tree = Phylo.read("small.newick", 'newick')
clade_names_fix(Tree)
G = Phylo.to_networkx(Tree)
networkx.write_graphml(G, 'small.graphml')

Giving me something that seem nice enough:

图片2

My questions are:

  • Is that a good way to do it? It seem weird to me that the function does not take care of the internal node names

  • If you replace one node name with a string long enough, it will be trimmed by the command Phylo.to.networkx(Tree) . How to avoid that?

Example: substitution of "dog" by "test_tring_that_create_some_problem_later_on"

图3

Looks like you got pretty far on this already. I can only suggest a few alternatives/extensions to your approach...

  1. Unfortunately, I couldn't find a Cytoscape app that can read this format. I tried searching for PHYLIP, NEWICK and PHYLO. You might have more luck:

  2. There is an old Cytoscape 2.x plugin that could read this format, but to run this you would need to install Cytoscape 2.8.3, import the network, then export as xGMML (or save as CYS) and then try to open in Cytoscape 3.7 in order to migrate back into the land of living code. Then again, if 2.8.3 does what you need for this particular case, then maybe you don't need to migrate:

  3. The best approach is programmatic, which you already explored. Finding an R or Python package that turns NEWICK into iGraph or GraphML is a solid strategy. Note that there are updated and slick Cytoscape libs in those languages as well, so you can do all label cleanup, layout, data visualization, analysis, export, etc all within the scripting environment:

After some research, I actually found a solution that work. I decided to provide the link here for you , dear reader: going to github

FYI for anyone coming across this now I think the first issue mentioned here has now been solved in BioPython. Using the same data as above, the.networkx graph which is built contains all the internal nodes of the tree as well as the terminal nodes.

import matplotlib.pyplot as plt

import networkx
from Bio import Phylo

Tree = Phylo.read("small.newick", 'newick')
G = Phylo.to_networkx(Tree)
networkx.draw_networkx(G)
plt.savefig("small_graph.png")

小图.png

Specs: Python 3.8.10, Bio 1.78,.networkx 2.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM