简体   繁体   中英

Converting GraphML file to another

Hi I have a simple graphML file and I would like to remove the node tag from the GraphML and save it in another GraphML file. The GraphML size is 3GB below given is the sample.

Input File :

<?xml version="1.0" ?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.1/graphml.xsd">
    <key id="weight" for="edge" attr.name="weight" attr.type="string"></key>
    <graph id="G" edgedefault="directed">
        <node id="1"></node>
        <node id="2">
        </node>
        <node id="3">
        </node>
        <node id="4">
        </node>
        <node id="5">
        </node>
        <edge id="6" source="1" target="2">
            <data key="weight">3</data>
        </edge>
        <edge id="7" source="2" target="4">
            <data key="weight">1</data>
        </edge>
        <edge id="8" source="2" target="3">
            <data key="weight">9</data>
        </edge>
    </graph>
</graphml>

Required Output :

<?xml version="1.0" ?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.1/graphml.xsd">
    <key id="weight" for="edge" attr.name="weight" attr.type="string"></key>
    <graph id="G" edgedefault="directed">
        <edge id="6" source="1" target="2">
            <data key="weight">3</data>
        </edge>
        <edge id="7" source="2" target="4">
            <data key="weight">1</data>
        </edge>
        <edge id="8" source="2" target="3">
            <data key="weight">9</data>
        </edge>
    </graph>
</graphml>

Are there any methods to do this ?

There is a python module to deal with graphml. Curiously, the documentation has no remove or delete function.

Since graphml is xml markup, you could use an xml module instead. I've used xmltodict and liked it very much. This module allows you to load xml code to a python object. After modifying the object, you can save it back to xml.

If data is a string containing the xml:

data_object=xmltodict.parse(data)
del data_object["graphml"]["graph"]["node"]
xmltodict.unparse(data_object, pretty=True)

This removes the node entries, the unparse will return a string with xml.

If the structure of the xml becomes more complex, you'll need to search for the nodes in the data_object . But that shouldn't be a problem, it's just an ordered dictionary.

Another problem might be the size of the xml. 3GB is a lot. xmltodict does support a streaming mode for large files, but that is something I've never used.

After some reading some Link I came up with the solution of iterative parsing. Bt I can't figure out the difference between simple parse and iterparse in terms of RAM usage.

Important Links :
- http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
- using lxml and iterparse() to parse a big (+- 1Gb) XML file

Code :

import lxml.etree as et

graphml = {  
   "graph": "{http://graphml.graphdrawing.org/xmlns}graph",  
   "node": "{http://graphml.graphdrawing.org/xmlns}node",  
   "edge": "{http://graphml.graphdrawing.org/xmlns}edge",  
   "data": "{http://graphml.graphdrawing.org/xmlns}data",  
   "weight": "{http://graphml.graphdrawing.org/xmlns}data[@key='weight']",  
   "edgeid": "{http://graphml.graphdrawing.org/xmlns}data[@key='edgeid']"  
}



for event, elem in et.iterparse("/data/sample.graphml",tag=graphml.get("edge"), events = ('end', )):  
    print(et.tostring(elem))
    elem.clear()
    while elem.getprevious() is not None:
        del elem.getparent()[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM