使用文本解析XML自關閉標記

Question

嘿大家我試圖解析我的XML文件的這一部分。 我遇到的問題是文本包含很多自閉標簽。 我無法刪除這些標簽，因為它們為我提供了一些索引細節。 如何在沒有所有“Node”標簽的情況下訪問文本？

<TextWithNodes>
 <Node id="0"/>A TEENAGER <Node
id="11"/>yesterday<Node id="20"/> accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2<Node id="146"/>.<Node
id="147"/>
</TextWithNodes>

Answer 1

雖然奇怪，但這個XML實際上是格式良好的，可以使用普通的XML工具進行解析。 TextWithNodes元素只是混合內容。

TextWithNodes的字符串值可以通過簡單的XPath獲得，

string(/TextWithNodes)

產生你想要的文本，沒有其他標記（自我關閉或其他）：

 A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.

Answer 2

下面是一些示例代碼，使用Java中的XPATH回答https://stackoverflow.com/a/49926918/2735286（@kjhughes的信用）：

public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException, XPathExpressionException {

    String text = "<TextWithNodes>\n" +
            " <Node id=\"0\"/>A TEENAGER <Node\n" +
            "id=\"11\"/>yesterday<Node id=\"20\"/> accused his parents of cruelty\n" +
            "by feeding him a daily diet of chips which sent his weight\n" +
            "ballooning to 22st at the age of l2<Node id=\"146\"/>.<Node\n" +
            "id=\"147\"/>\n" +
            "</TextWithNodes>";
    DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = builderFactory.newDocumentBuilder();
    Document xmlDocument = builder.parse(new ByteArrayInputStream(text.getBytes("UTF-8")));
    XPath xPath = XPathFactory.newInstance().newXPath();
    String expression = "//TextWithNodes";
    System.out.println(xPath.compile(expression).evaluate(xmlDocument, XPathConstants.STRING));
}

打印出：

A TEENAGER yesterday accused his parents of cruelty by feeding him a daily diet of chips which sent his weight ballooning to 22st at the age of l2.

Answer 3

使用XML解析器庫，如Jsoup。 https://jsoup.org/

如何在這個問題的答案中提供：如何使用jsoup解析XML

使用文本解析XML自關閉標記

問題描述

3 個解決方案

解決方案1
2 2018-04-19 17:29:12

解決方案2
1 已采納 2018-04-19 17:44:07

解決方案3
0 2018-04-19 17:23:38

使用文本解析XML自關閉標記

問題描述

3 個解決方案

解決方案1 2 2018-04-19 17:29:12

解決方案2 1 已采納 2018-04-19 17:44:07

解決方案3 0 2018-04-19 17:23:38

解決方案1
2 2018-04-19 17:29:12

解決方案2
1 已采納 2018-04-19 17:44:07

解決方案3
0 2018-04-19 17:23:38