[英]Parse XML Self-Closing Tags with Text
嘿大家我試圖解析我的XML文件的這一部分。 我遇到的問題是文本包含很多自閉標簽。 我無法刪除這些標簽,因為它們為我提供了一些索引細節。 如何在沒有所有“Node”標簽的情況下訪問文本?
<TextWithNodes>
<Node id="0"/>A TEENAGER <Node
id="11"/>yesterday<Node id="20"/> accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2<Node id="146"/>.<Node
id="147"/>
</TextWithNodes>
雖然奇怪,但這個XML實際上是格式良好的,可以使用普通的XML工具進行解析。 TextWithNodes
元素只是混合內容。
TextWithNodes
的字符串值可以通過簡單的XPath獲得,
string(/TextWithNodes)
產生你想要的文本,沒有其他標記(自我關閉或其他):
A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.
下面是一些示例代碼,使用Java中的XPATH回答https://stackoverflow.com/a/49926918/2735286(@kjhughes的信用):
public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException, XPathExpressionException {
String text = "<TextWithNodes>\n" +
" <Node id=\"0\"/>A TEENAGER <Node\n" +
"id=\"11\"/>yesterday<Node id=\"20\"/> accused his parents of cruelty\n" +
"by feeding him a daily diet of chips which sent his weight\n" +
"ballooning to 22st at the age of l2<Node id=\"146\"/>.<Node\n" +
"id=\"147\"/>\n" +
"</TextWithNodes>";
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(new ByteArrayInputStream(text.getBytes("UTF-8")));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//TextWithNodes";
System.out.println(xPath.compile(expression).evaluate(xmlDocument, XPathConstants.STRING));
}
打印出:
A TEENAGER yesterday accused his parents of cruelty by feeding him a daily diet of chips which sent his weight ballooning to 22st at the age of l2.
使用XML解析器庫,如Jsoup。 https://jsoup.org/
如何在這個問題的答案中提供: 如何使用jsoup解析XML
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.