如何將多個XML元素解析為一個字符串？

Question

考慮下面的XML ，它僅包含兩個稱為Mydoc節點：

<?xml version="1.0" encoding="UTF-8" ?>
<Mydoc Time="2017-01-02"
    Period="2017-01-03">
    <mycontent ClassID="kinder">
        <bibliography>
            <Id>
                <Num>123456</Num>
            </Id>
                <Body>
this is some crazy text my friend
                </Body>
                <myreaders>
                    <names>
                        <Id>john</Id>
                        <value>95</value>
                    </names>
                </myreaders>
                <school>
                    <myclass>
                        <Id>12</Id>
                        <name>Laura</name>
                    </myclass>
                    <myclass>
                        <Id>14</Id>
                        <name>Frank</name>
                    </myclass>
                    <myclass>
                        <Id>144</Id>
                        <name>Jonny</name>
                    </myclass>
                    <myclass>
                        <Id>222</Id>
                        <name>Alex</name>
                    </myclass>
                    <myclass>
                        <Id>5443</Id>
                        <name>Johnny Doe</name>
                    </myclass>
                </school>
        </bibliography>
</mycontent>
    <mycontent ClassID="preK">
        <bibliography>
            <Id>
                <Num>123456</Num>
            </Id>
                <Body>
this is another crazy text my friend
                </Body>
                <myreaders>
                    <names>
                        <Id>fritz</Id>
                        <value>133</value>
                    </names>
                </myreaders>
        </bibliography>
</mycontent>
</Mydoc>

我正在嘗試使用xml.etree進行解析，並將每個節點mydoc放入Pandas數據xml.etree一行。

但是，正如您所看到的，我想將多個稱為myclass元素合並到數據幀的一個單元中。

例如，預期的輸出（具有兩行兩列的Pandas數據框）將類似於：

myreaders      school
"(john-95)"  "(12-Laura),(14-Frank),(144-Johnny),(222-Alex),(5443-Johnny Doe)"
"(fritz-133)" ""

我嘗試使用xpath但無法將xpath查找合並到一個字符串列表中，例如"(12-Laura),(14-Frank),(144-Johnny),(222-Alex),(5443-Johnny Doe)"

有什么建議么？

謝謝！

Answer 1

您可以嘗試XmlToDict ，並將XML解析為字典/列表，這可以使您的嘗試變得容易得多。 然后，您可以循環/遍歷myclass詞典的列表。 希望能有所幫助。

Answer 2

列表理解技巧已經相當流行，但是我認為這是您所需要的。

import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('test.xml')
root = tree.getroot()
dicty = {}
dicty['myreaders'] = [','.join(['(' + x.findall('Id')[0].text + '-' + x.findall('value')[0].text + ')' for x in (root.findall('.//mycontent/bibliography/myreaders/names'))])]
dicty['school'] = [','.join(['(' + x.findall('Id')[0].text + '-' + x.findall('name')[0].text + ')' for x in (root.findall('.//mycontent/bibliography/school/myclass'))])]
print(dicty)
print(pd.DataFrame(dicty))

輸出：

   myreaders                                             school
0  (john-95)  (12-Laura),(14-Frank),(144-Jonny),(222-Alex),(...

沒有真正簡單的方法來解析xml，您需要大量有關數據結構的知識。

Answer 3

您是否考慮過使用lxml模塊？ 它具有一些非常方便的方法來解析和遍歷xml文件。 例如，我嘗試了以下findall-lists並將它們組合在一起，然后可以將它們添加到記錄的“其余”部分：

from lxml import etree
root = etree.parse(path_to_xml_file)
ids = [i.text for i in root.findall(".//myclass/Id")]
names = [n.text for n in root.findall(".//myclass/Name")]
comb = list(zip(ids,names))

這將為您提供：

[('12', 'Laura'), ('14', 'Frank'), ('144', 'Jonny'), ('222', 'Alex')]

Answer 4

與其他答案類似，短一點，適用於新添加的節點：

parsedXML = ET.parse( "sample.xml")
root = parsedXML.getroot()
pairs0 = []
pairs1 = []
for mycontent in root.iter('mycontent'):
    pairs0.append(','.join(['(' + name[0].text + '-' + name[1].text + ')' for name in mycontent.iter('names')]))
    pairs1.append(','.join(['(' + myclass[0].text + '-' + myclass[1].text + ')' for myclass in mycontent.iter('myclass')]))
df = pd.DataFrame(data = {"myreaders": pairs0, "school": pairs1}, columns=['myreaders', 'school'])

編輯：重做以解決多種情況。

如何將多個XML元素解析為一個字符串？

問題描述

4 個解決方案

解決方案1
1 2018-04-16 20:50:11

解決方案2
1 2018-04-16 20:51:09

解決方案3
1 2018-04-16 21:04:29

解決方案4
1 已采納 2018-04-16 21:36:45

如何將多個XML元素解析為一個字符串？

問題描述

4 個解決方案

解決方案1 1 2018-04-16 20:50:11

解決方案2 1 2018-04-16 20:51:09

解決方案3 1 2018-04-16 21:04:29

解決方案4 1 已采納 2018-04-16 21:36:45

解決方案1
1 2018-04-16 20:50:11

解決方案2
1 2018-04-16 20:51:09

解決方案3
1 2018-04-16 21:04:29

解決方案4
1 已采納 2018-04-16 21:36:45