[英]How to grab text content wrapped in CDATA tag from a piece of XML JAVA
我有以下XML:
<?xml version="1.0"?>
<doOrchestration xmlns="http://comResponse.engine/response">
<response uuid="86db9b58-312b-4cbb-8aa5-df3663884291">
<headers>
<header name="Content-Type">application/xml</header>
<header name="Server">local-C++</header>
</headers>
<responseCode>200</responseCode>
<content><![CDATA[<explanation></explanation>]]></content>
</response>
</doOrchestration>
我想從內容節點中解析出以下文本,如下所示:
<![CDATA[<explanation></explanation>]]>
請注意,這里的內容包裝在CDATA標記中。 如何使用任何方法在Java中完成此操作。
這是我的代碼:
@Test
public void testGetDoOrchResponse() throws IOException {
String path = "/Users/haddad/Git/Tools/ContentUtils/src/test/resources/testdata/doOrch_testfiles/doOrch_response.xml";
File f = new File(path);
String response = FileUtils.readFileToString(f);
String content = getDoOrchResponse(response, "content");
System.out.println("Content: "+content);
}
//輸出:內容:空白
static String getDoOrchResponse(String xml, String tagFragment) throws FileNotFoundException {
String content = new String();
try {
Document doc = getDocumentXML(xml);
NodeList nlNodeExplanationList = doc.getElementsByTagName("response");
for(int i=0;i<nlNodeExplanationList.getLength();i++) {
Node explanationNode = nlNodeExplanationList.item(i);
List<String> titleList = getTextValuesByTagName((Element)explanationNode, tagFragment);
content = titleList.get(0);
}
}
catch (IOException e) {
e.printStackTrace();
}
return content;
}
static List<String> getTextValuesByTagName(Element element, String tagName) {
NodeList nodeList = element.getElementsByTagName(tagName);
ArrayList<String> list = new ArrayList<String>();
for (int i = 0; i < nodeList.getLength(); i++) {
String textValue = getTextValue(nodeList.item(i));
if(textValue.equalsIgnoreCase("") ) {
textValue = "blank";
}
list.add(textValue);
}
return list;
}
static String getTextValue(Node node) {
StringBuffer textValue = new StringBuffer();
int length = node.getChildNodes().getLength();
for (int i = 0; i < length; i ++) {
Node c = node.getChildNodes().item(i);
if (c.getNodeType() == Node.TEXT_NODE) {
textValue.append(c.getNodeValue());
}
}
return textValue.toString().trim();
}
static Document getDocumentXML(String xml) throws FileNotFoundException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
Document doc = null;
try {
db = dbf.newDocumentBuilder();
doc = db.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));
doc.getDocumentElement().normalize();
}
catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
return doc;
}
我究竟做錯了什么? 為什么輸出空白? 我只是看不到...
如果要提取Element
節點的內容,請使用getTextContent()
方法。 如果您確實需要或想要CDATA部分標記,則需要使用LSSerializer
或類似的方法序列化該節點:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new File("doc1.xml"));
Element content = (Element)doc.getElementsByTagNameNS("http://comResponse.engine/response", "content").item(0);
if (content != null)
{
System.out.println(content.getTextContent());
LSSerializer ser = ((DOMImplementationLS)doc.getImplementation()).createLSSerializer();
if (content.getFirstChild() != null)
{
System.out.println(ser.writeToString(content.getFirstChild()));
}
}
這是理論,對我來說,Java JRE 1.8輸出<![CDATA[<explanation></explanation>
沒有CDATA節的結束標記,看來LSSerializer
在單個CDATA節節點上無法正常工作。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.