<node> test
test
test
</node>
I want my XML parser read characters in <node>
and:
	
), newlines ( 

) or whitespaces ( 
) - they should be left. I'm trying a code below, but it preserve duplicated whitespaces.
dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringComments( true );
dbf.setNamespaceAware( namespaceAware );
db = dbf.newDocumentBuilder();
doc = db.parse( inputStream );
Is the any way to do what I want?
Thanks!
The first part - replacing multiple white-space - is relatively easy though I don't think the parser will do it for you:
InputSource stream = new InputSource(inputStream);
XPath xpath = XPathFactory.newInstance().newXPath();
Document doc = (Document) xpath.evaluate("/", stream, XPathConstants.NODE);
NodeList nodes = (NodeList) xpath.evaluate("//text()", doc,
XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
Text text = (Text) nodes.item(i);
text.setTextContent(text.getTextContent().replaceAll("\\s{2,}", " "));
}
// check results
TransformerFactory.newInstance()
.newTransformer()
.transform(new DOMSource(doc), new StreamResult(System.out));
This is the hard part:
If the node contains XML encoded characters: tabs (
	
), newlines (

) or whitespaces (
) - they should be left.
The parser will always turn "	"
into "\\t"
- you may need to write your own XML parser.
According to the author of Saxon :
I don't think any XML parser will report numeric character references to the application - they will always be expanded. Really, your application shouldn't care about this any more than it cares about how much whitespace there is between attributes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.