简体   繁体   中英

Parsing XML with no closing tags in Java

I am having trouble parsing an XML with no closing tag. Please see snippet of the xml below.

I have tried SAX and also StAX Parser they both need a properly formatted XML with closing tag XXYY....as you can see below the XML format is a little bit different... Please help me if there is any API out there that can help me parse this or if SAX/StAX can help me achieve what I want.... :(

<Employees>
 <Employee>
  <Detail>
    <Date>2018014
    <Name>XXYY
    <Age>0
    <LANGUAGE>ENG
    <Manager>
    <MName>YYXX
    <MID>5959
    </Manager>
    <EmployeeID>1234
  </Detail>
 </Employee>
</Employees>

You could "fix" the XML by adding all the missing end-tags.

Any start-tag that contains text after the tag, on the same line, could be fixed by adding an end-tag at the end of the line.

The rule of "contains text" ensures that eg the <Manager> tag doesn't get ended, since that is actually ended 3 lines down.

Example working code:

// Load file into memory
String xml = new String(Files.readAllBytes(Paths.get("test.xml")), StandardCharsets.UTF_8);

// Apply magic to add missing end-tags
xml = xml.replaceAll("(?m)^(\\s*)<(\\w+)>([^<]+)$", "$1<$2>$3</$2>");

// Parse then print the XML, to ensure there are no errors
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                                          .parse(new InputSource(new StringReader(xml)));
TransformerFactory.newInstance().newTransformer()
                  .transform(new DOMSource(document), new StreamResult(System.out));

That appears to be SGML not XML. I've answered a newer question (for Javascript/node.js, but relevant to Java as well) detailing how to use the OpenSP SGML software to create XML from SGML.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM