简体   繁体   English

如何在生成XML时保留CDATA中的换行符?

[英]How to preserve newlines in CDATA when generating XML?

I want to write some text that contains whitespace characters such as newline and tab into an xml file so I use 我想写一些包含空格字符的文本,如newlinetab到xml文件中,所以我使用

Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));

but when I read this back in using 但是当我在使用中读回来的时候

Node vs =  xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();

I get a string that has no newlines anymore. 我得到一个没有新行的字符串了。
When i look directly into the xml on disk, the newlines seem preserved. 当我直接查看磁盘上的xml时,新行似乎得以保留。 so the problem occurs when reading in the xml file. 所以在读取xml文件时会出现问题。

How can I preserve the newlines? 我该如何保留换行符?

Thanks! 谢谢!

I don't know how you parse and write your document, but here's an enhanced code example based on yours: 我不知道你如何解析和编写你的文档,但这是一个基于你的增强代码示例:

// creating the document in-memory                                                        
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Element element = xmldoc.createElement("TestElement");                                    
xmldoc.appendChild(element);                                                              
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              

// serializing the xml to a string                                                        
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             

DOMImplementationLS impl =                                                                
    (DOMImplementationLS)registry.getDOMImplementation("LS");                             

LSSerializer writer = impl.createLSSerializer();                                          
String str = writer.writeToString(xmldoc);                                                

// printing the xml for verification of whitespace in cdata                               
System.out.println("--- XML ---");                                                        
System.out.println(str);                                                                  

// de-serializing the xml from the string                                                 
final Charset charset = Charset.forName("utf-16");                                        
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
final Node child = vs.getFirstChild();                                                    
String x = child.getNodeValue();                                                          

// print the value, yay!                                                                  
System.out.println("--- Node Text ---");                                                  
System.out.println(x);                                                                    

The serialization using LSSerializer is the W3C way to do it ( see here ). 使用LSSerializer进行序列化是W3C的方法( 参见此处 )。 The output is as expected, with line separators: 输出是预期的,带有行分隔符:

--- XML --- 
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text --- 
first line
second line

You need to check the type of each node using node.getNodeType(). 您需要使用node.getNodeType()检查每个节点的类型。 If the type is CDATA_SECTION_NODE, you need to concat the CDATA guards to node.getNodeValue. 如果类型为CDATA_SECTION_NODE,则需要将CDATA保护连接到node.getNodeValue。

You don't necessarily have to use CDATA to preserve white space characters. 您不一定要使用CDATA来保留空格字符。 The XML specification specify how to encode these characters. XML 规范指定了如何编码这些字符。

So for example, if you have an element with value that contains new space you should encode it with 因此,例如,如果您有一个包含新空间的值的元素,则应使用它进行编码

  &#xA;

Carriage return: 回程:

 &#xD;

And so forth 等等

EDIT: cut all the irrelevant stuff 编辑:削减所有无关的东西

I'm curious to know what DOM implementation you're using, because it doesn't mirror the default behaviour of the one in a couple of JVMs I've tried (they ship with a Xerces impl). 我很想知道你正在使用什么DOM实现,因为它没有镜像我尝试过的几个JVM中的默认行为(它们带有Xerces impl)。 I'm also interested in what newline characters your document has. 我也对你的文档有哪些换行符感兴趣。

I'm not sure if whether CDATA should preserve whitespace is a given. 我不确定CDATA是否应该保留空白是给定的。 I suspect that there are many factors involved. 我怀疑涉及很多因素。 Don't DTDs/schemas affect how whitespace is processed? DTD /模式不会影响处理空格的方式吗?

You could try using the xml:space="preserve" attribute. 您可以尝试使用xml:space =“preserve”属性。

xml:space='preserve' is not it. xml:space ='preserve'不是它。 That is only for "all whitespace" nodes. 这仅适用于“所有空白”节点。 That is, if you want the whitespace nodes in 也就是说,如果你想要空白节点

<this xml:space='preserve'> <has/>
<whitespace/>
</this>

But see that those whitespace nodes are ONLY whitespace. 但是看到那些空白节点只有空格。

I have been struggling to get Xerces to generate events allowing isolation of CDATA content as well. 我一直在努力让Xerces生成允许隔离CDATA内容的事件。 I have no solution as yet. 我还没有解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM