I want to read an xml file to pandas. Here is sample XML file
<author lang="en" class="1">
<documents>
<document><![CDATA[Hellow how are you]]></document>
<document><![CDATA[I am good]]></document>
<document><![CDATA[What about you]]></document>
</documents>
</author>
This is what I have tried
from xml.dom import minidom
xmldoc = minidom.parse('text.xml')
itemlist = xmldoc.getElementsByTagName('document')
But I don't know how to move ahead and get values from itemlist
. When I print it, I for following output
[<DOM Element: document at 0x170c9b229d0>,
<DOM Element: document at 0x170c9b22a60>,
<DOM Element: document at 0x170c9b22af0>]
How can I get strings out of it?
I believe you need
import pandas as pd
from xml.dom import minidom
xmldoc = minidom.parse('text.xml')
itemlist = xmldoc.getElementsByTagName('document')
df = pd.DataFrame({"document": (i.firstChild.nodeValue for i in itemlist)})
print(df)
Output:
document
0 Hellow how are you
1 I am good
2 What about you
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.