简体   繁体   English

Python解析嵌套的XML

[英]Python parse nested xml

I have an xml file that has multiple layers of data in. 我有一个xml文件,其中包含多层数据。

<?xml version="1.0" encoding="UTF-8"?>
<DeviceLog DevID="10503847" DocDate="2017-03-01T00:00:00" BSLogDate="2017-02-28T06:22:36">
    <Log LogTime="2017-02-27T18:33:58">
        <DevLog State="PowerOn"/>
    </Log>
    <Log LogTime="2017-02-28T08:59:03">
        <ComponentPrivateDataLog>
            <Component>1</Component>
            <DataType>1</DataType>
            <PrivateData>0301</PrivateData>
</ComponentPrivateDataLog>
    </Log>
    <Log LogTime="2017-02-28T08:59:13">
        <ComponentPrivateDataLog>
            <Component>1</Component>
            <DataType>1</DataType>
            <PrivateData>0401</PrivateData>
</ComponentPrivateDataLog>
    </Log>
    <Log LogTime="2017-02-28T10:16:44">
        <DevLog State="StandByIn"/>
    </Log>
    <Log LogTime="2017-02-28T12:29:55">
        <EndOfFileLog />
    </Log>
</DeviceLog>

In this, each Log tag is a separate entity having its own time attribute and a child node. 在这种情况下,每个Log标签都是一个单独的实体,具有自己的时间属性和一个子节点。 I am using minidom to parse the data. 我使用minidom解析数据。

The following is the code: 以下是代码:

from xml.dom import minidom
xmldoc=minidom.parse("testxml.xml")
dl=xmldoc.getElementsByTagName("DeviceLog")
for d in dl:
    dId=d.attributes["DevID"]
    dId=dId.value
    dod=d.attributes["DocDate"]
    dod=dod.value
    bsld=d.attributes["BSLogDate"]
    bsld=bsld.value

log=xmldoc.getElementsByTagName("Log")
for l in log:
    logtime = l.attributes["LogTime"]
    logtime = logtime.value 
    devLog = l.getElementsByTagName("DevLog")
    for dl in devLog:
        devEvnt = dl.attributes["State"]    
        devEvnt = devEvnt.value
print dId,dod,bsld,logtime, devEvnt

The above code prints the time and state of the StandBy (last entry) and not the first PowerOn state. 上面的代码显示了StandBy(最后一个条目)的时间和状态,而不是第一个PowerOn状态。 I tried indexing log=xmldoc.getElementsByTagName("Log")[0] and similarly for logtime. 我尝试索引log=xmldoc.getElementsByTagName("Log")[0]并且类似地使用logtime。 But didn't work. 但是没有用。

How can i parse the logs so that I get each log with time in a separate line? 我如何解析日志,以便每条日志都有时间显示在单独的行中?

If it helps you, use a special parser that reads your XML data into a pretty dictionary, which is a bit easier to deal with. 如果有帮助,请使用特殊的解析器,将您的XML数据读入漂亮的字典中,这更容易处理。

import xmltodict

myxml = """
...
"""
mydict = xmltodict.parse(myxml)
logs = mydict["DeviceLog"]["Log"]

for log in logs:
    log_time = log["@LogTime"]
    dev_log = log.get("DevLog", None)
    component_log = log.get("ComponentPrivateDataLog", None)

    if dev_log:
        print(log_time, dev_log["@State"])
    if component_log:
        print(log_time, component_log["Component"], component_log["PrivateData"])

Example of such a parser: xmltodict . 此类解析器的示例: xmltodict

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM