將 GraphML 文件轉換為另一個

Question

嗨，我有一個簡單的 graphML 文件，我想從 GraphML 中刪除節點標記並將其保存在另一個 GraphML 文件中。 GraphML 大小為 3GB，下面給出的是示例。

輸入文件：

<?xml version="1.0" ?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.1/graphml.xsd">
    <key id="weight" for="edge" attr.name="weight" attr.type="string"></key>
    <graph id="G" edgedefault="directed">
        <node id="1"></node>
        <node id="2">
        </node>
        <node id="3">
        </node>
        <node id="4">
        </node>
        <node id="5">
        </node>
        <edge id="6" source="1" target="2">
            <data key="weight">3</data>
        </edge>
        <edge id="7" source="2" target="4">
            <data key="weight">1</data>
        </edge>
        <edge id="8" source="2" target="3">
            <data key="weight">9</data>
        </edge>
    </graph>
</graphml>

所需輸出：

<?xml version="1.0" ?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.1/graphml.xsd">
    <key id="weight" for="edge" attr.name="weight" attr.type="string"></key>
    <graph id="G" edgedefault="directed">
        <edge id="6" source="1" target="2">
            <data key="weight">3</data>
        </edge>
        <edge id="7" source="2" target="4">
            <data key="weight">1</data>
        </edge>
        <edge id="8" source="2" target="3">
            <data key="weight">9</data>
        </edge>
    </graph>
</graphml>

有沒有辦法做到這一點？

Answer 1

有一個python模塊來處理graphml。 奇怪的是，該文檔沒有remove或delete功能。

由於 graphml 是 xml 標記，因此您可以改用 xml 模塊。 我用過xmltodict並且非常喜歡它。 此模塊允許您將 xml 代碼加載到 python 對象。 修改對象后，可以將其保存回xml。

如果data是包含 xml 的字符串：

data_object=xmltodict.parse(data)
del data_object["graphml"]["graph"]["node"]
xmltodict.unparse(data_object, pretty=True)

這將刪除node條目，unparse 將返回一個帶有 xml 的字符串。

如果 xml 的結構變得更復雜，則需要在data_object搜索節點。 但這應該不是問題，它只是一個有序的字典。

另一個問題可能是 xml 的大小。 3GB 很多。 xmltodict 確實支持大文件的流模式，但這是我從未使用過的。

Answer 2

在閱讀了一些鏈接之后，我想出了迭代解析的解決方案。 Bt 我無法弄清楚簡單解析和迭代解析在 RAM 使用方面的區別。

重要鏈接：
- http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
- 使用 lxml 和 iterparse() 來解析一個大 (+- 1Gb) XML 文件

代碼：

導入 lxml.etree as et

graphml = {  
   "graph": "{http://graphml.graphdrawing.org/xmlns}graph",  
   "node": "{http://graphml.graphdrawing.org/xmlns}node",  
   "edge": "{http://graphml.graphdrawing.org/xmlns}edge",  
   "data": "{http://graphml.graphdrawing.org/xmlns}data",  
   "weight": "{http://graphml.graphdrawing.org/xmlns}data[@key='weight']",  
   "edgeid": "{http://graphml.graphdrawing.org/xmlns}data[@key='edgeid']"  
}



for event, elem in et.iterparse("/data/sample.graphml",tag=graphml.get("edge"), events = ('end', )):  
    print(et.tostring(elem))
    elem.clear()
    while elem.getprevious() is not None:
        del elem.getparent()[0]

將 GraphML 文件轉換為另一個

問題描述

2 個解決方案

解決方案1
1 2017-01-19 07:31:22

解決方案2
1 2017-01-20 11:11:07

將 GraphML 文件轉換為另一個

問題描述

2 個解決方案

解決方案1 1 2017-01-19 07:31:22

解決方案2 1 2017-01-20 11:11:07

解決方案1
1 2017-01-19 07:31:22

解決方案2
1 2017-01-20 11:11:07