简体   繁体   English

如何解析具有多行的不规则 XML 文件

[英]How to parse an irregular XML file which has multiple lines

I have an XML file named as file.txt as shown below:我有一个名为 file.txt 的 XML 文件,如下所示:

<message><header><msg-date></msg-date><msg-time></msg-time><sys-id></sys-id></header><record><remittance-details></remittance-details><source-sys-id></source-sys-id></message>
<message><header><msg-date></msg-date><msg-time></msg-time><sys-id></sys-id></header><record><remittance-details></remittance-details><source-sys-id></source-sys-id></message>
<message><header><msg-date></msg-date><msg-time></msg-time><sys-id></sys-id></header><record><remittance-details></remittance-details><source-sys-id></source-sys-id></message>
<message><header><msg-date></msg-date><msg-time></msg-time><sys-id></sys-id></header><record><remittance-details></remittance-details><source-sys-id></source-sys-id></message>
<message><header><msg-date></msg-date><msg-time></msg-time><sys-id></sys-id></header><record><remittance-details></remittance-details><source-sys-id></source-sys-id></message>

I need to process the above file after parsing it to a standard format like below in the file:我需要在将上述文件解析为文件中如下所示的标准格式后对其进行处理:

<message>
  <header>
    <msg-date></msg-date>
    <msg-time></msg-time>
    <sys-id></sys-id>
  </header>
  <record>
    <remittance-details></remittance-details>
  </record>
</message>

updated the xml details to avoid confusion.更新了 xml 详细信息以避免混淆。 The examples shown above are just for understanding as unable to share the whole details here(pls ignore if there any tag missing).上面显示的示例仅用于理解,因为无法在此处分享全部细节(如果缺少任何标签,请忽略)。

I have written the below code to parse it:我编写了以下代码来解析它:

import xml.etree.ElementTree as ET
import lxml.etree as etree
import os
import sys
File_path = os.path.abspath(__file__)
BASE_DIR = os.path.dirname(File_path)
file = os.path.join(BASE_DIR,'file.txt')
parser = etree.XMLParser(recover=True)
dom = etree.parse(file,parser=parser )
xmlstr = etree.tostring(dom, pretty_print=True)
with open (file, "wb") as f:
    f.write(xmlstr) 

However, its parsing the first line of the file and not parsing the complete file, so the processing is getting failed.但是,它解析文件的第一行而不是解析完整的文件,因此处理失败。 Therefore, would like to understand how to parse the complete xml lines in the file to process it.因此,想了解如何解析文件中完整的 xml 行来处理它。

You have few issues:你有几个问题:

  1. There is no root in your doc.您的文档中没有根。 This can be solved by wrapping the xml text with <root>..</root>这可以通过用<root>..</root>包裹 xml 文本来解决
  2. The tag remittance-details is not closed so its is INVALID XML.标签remittance-details未关闭,因此它是无效的 XML。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM