简体   繁体   English

使用 pandas 读取 xml 文件

[英]Read xml file with pandas

I want to read an xml file to pandas.我想将 xml 文件读取到 pandas。 Here is sample XML file这是示例 XML 文件

<author lang="en" class="1">
    <documents>
        <document><![CDATA[Hellow how are you]]></document>
        <document><![CDATA[I am good]]></document>
        <document><![CDATA[What about you]]></document>
    </documents>
</author>

This is what I have tried这是我尝试过的

from xml.dom import minidom
xmldoc = minidom.parse('text.xml')
itemlist = xmldoc.getElementsByTagName('document')

But I don't know how to move ahead and get values from itemlist .但我不知道如何继续并从itemlist获取值。 When I print it, I for following output当我打印它时,我关注 output

[<DOM Element: document at 0x170c9b229d0>,
 <DOM Element: document at 0x170c9b22a60>,
 <DOM Element: document at 0x170c9b22af0>]

How can I get strings out of it?我怎样才能从中取出字符串?

I believe you need我相信你需要

import pandas as pd
from xml.dom import minidom
xmldoc = minidom.parse('text.xml')
itemlist = xmldoc.getElementsByTagName('document')

df = pd.DataFrame({"document": (i.firstChild.nodeValue for i in itemlist)})
print(df)

Output: Output:

             document
0  Hellow how are you
1           I am good
2      What about you

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM