简体   繁体   English

将 GraphML 文件转换为另一个

[英]Converting GraphML file to another

Hi I have a simple graphML file and I would like to remove the node tag from the GraphML and save it in another GraphML file.嗨,我有一个简单的 graphML 文件,我想从 GraphML 中删除节点标记并将其保存在另一个 GraphML 文件中。 The GraphML size is 3GB below given is the sample. GraphML 大小为 3GB,下面给出的是示例。

Input File :输入文件 :

<?xml version="1.0" ?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.1/graphml.xsd">
    <key id="weight" for="edge" attr.name="weight" attr.type="string"></key>
    <graph id="G" edgedefault="directed">
        <node id="1"></node>
        <node id="2">
        </node>
        <node id="3">
        </node>
        <node id="4">
        </node>
        <node id="5">
        </node>
        <edge id="6" source="1" target="2">
            <data key="weight">3</data>
        </edge>
        <edge id="7" source="2" target="4">
            <data key="weight">1</data>
        </edge>
        <edge id="8" source="2" target="3">
            <data key="weight">9</data>
        </edge>
    </graph>
</graphml>

Required Output :所需输出:

<?xml version="1.0" ?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.1/graphml.xsd">
    <key id="weight" for="edge" attr.name="weight" attr.type="string"></key>
    <graph id="G" edgedefault="directed">
        <edge id="6" source="1" target="2">
            <data key="weight">3</data>
        </edge>
        <edge id="7" source="2" target="4">
            <data key="weight">1</data>
        </edge>
        <edge id="8" source="2" target="3">
            <data key="weight">9</data>
        </edge>
    </graph>
</graphml>

Are there any methods to do this ?有没有办法做到这一点?

There is a python module to deal with graphml.有一个python模块来处理graphml。 Curiously, the documentation has no remove or delete function.奇怪的是,该文档没有removedelete功能。

Since graphml is xml markup, you could use an xml module instead.由于 graphml 是 xml 标记,因此您可以改用 xml 模块。 I've used xmltodict and liked it very much.我用过xmltodict并且非常喜欢它。 This module allows you to load xml code to a python object.此模块允许您将 xml 代码加载到 python 对象。 After modifying the object, you can save it back to xml.修改对象后,可以将其保存回xml。

If data is a string containing the xml:如果data是包含 xml 的字符串:

data_object=xmltodict.parse(data)
del data_object["graphml"]["graph"]["node"]
xmltodict.unparse(data_object, pretty=True)

This removes the node entries, the unparse will return a string with xml.这将删除node条目,unparse 将返回一个带有 xml 的字符串。

If the structure of the xml becomes more complex, you'll need to search for the nodes in the data_object .如果 xml 的结构变得更复杂,则需要在data_object搜索节点。 But that shouldn't be a problem, it's just an ordered dictionary.但这应该不是问题,它只是一个有序的字典。

Another problem might be the size of the xml.另一个问题可能是 xml 的大小。 3GB is a lot. 3GB 很多。 xmltodict does support a streaming mode for large files, but that is something I've never used. xmltodict 确实支持大文件的流模式,但这是我从未使用过的。

After some reading some Link I came up with the solution of iterative parsing.在阅读了一些链接之后,我想出了迭代解析的解决方案。 Bt I can't figure out the difference between simple parse and iterparse in terms of RAM usage. Bt 我无法弄清楚简单解析和迭代解析在 RAM 使用方面的区别。

Important Links :重要链接:
- http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
- using lxml and iterparse() to parse a big (+- 1Gb) XML file - 使用 lxml 和 iterparse() 来解析一个大 (+- 1Gb) XML 文件

Code :代码 :

import lxml.etree as et导入 lxml.etree as et

graphml = {  
   "graph": "{http://graphml.graphdrawing.org/xmlns}graph",  
   "node": "{http://graphml.graphdrawing.org/xmlns}node",  
   "edge": "{http://graphml.graphdrawing.org/xmlns}edge",  
   "data": "{http://graphml.graphdrawing.org/xmlns}data",  
   "weight": "{http://graphml.graphdrawing.org/xmlns}data[@key='weight']",  
   "edgeid": "{http://graphml.graphdrawing.org/xmlns}data[@key='edgeid']"  
}



for event, elem in et.iterparse("/data/sample.graphml",tag=graphml.get("edge"), events = ('end', )):  
    print(et.tostring(elem))
    elem.clear()
    while elem.getprevious() is not None:
        del elem.getparent()[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM