简体   繁体   中英

Read xml file with pandas

I want to read an xml file to pandas. Here is sample XML file

<author lang="en" class="1">
    <documents>
        <document><![CDATA[Hellow how are you]]></document>
        <document><![CDATA[I am good]]></document>
        <document><![CDATA[What about you]]></document>
    </documents>
</author>

This is what I have tried

from xml.dom import minidom
xmldoc = minidom.parse('text.xml')
itemlist = xmldoc.getElementsByTagName('document')

But I don't know how to move ahead and get values from itemlist . When I print it, I for following output

[<DOM Element: document at 0x170c9b229d0>,
 <DOM Element: document at 0x170c9b22a60>,
 <DOM Element: document at 0x170c9b22af0>]

How can I get strings out of it?

I believe you need

import pandas as pd
from xml.dom import minidom
xmldoc = minidom.parse('text.xml')
itemlist = xmldoc.getElementsByTagName('document')

df = pd.DataFrame({"document": (i.firstChild.nodeValue for i in itemlist)})
print(df)

Output:

             document
0  Hellow how are you
1           I am good
2      What about you

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM