简体   繁体   English

在Java中解析XML多行字符串

[英]Parse XML multi line string in Java

I'm trying to parse a multi line XML attribute in Java using the classic DOM. 我正在尝试使用经典DOM解析Java中的多行XML属性。 The parsing is working just fine. 解析工作得很好。 However, it's destroying the line breaks so, when I render my parsed string, line breaks get replaced by simple spaces. 但是,它正在破坏换行符,因此,当我渲染解析后的字符串时,换行符会被简单的空格所取代。

<string key="help_text" value="This is a multi line long
                               text. This should be parsed
                               and rendered in multiple lines" />

To get the attribute I'm using: 要获取我正在使用的属性:

attributes.getNamedItem("value").getTextContent()

If I just pass a manually typed string to the render method using "\\n", the text gets drawn as intended. 如果我只是使用“\\ n”将手动键入的字符串传递给render方法,则文本将按预期绘制。

Any ideas? 有任何想法吗?

According to the XML specification the XML parser MUST normalize attribute whitespace, such as replacing a line break character with a space. 根据XML规范 ,XML解析器必须规范化属性空白,例如用空格替换换行符。 Ie if you require line breaks to be preserved you cannot use an attribute value. 即如果您需要保留换行符,则不能使用属性值。

In general, whitespace handling in XML is a lot of trouble. 通常,XML中的空白处理很麻烦。 In particular, the difference between CR, LF, and CRLF isn't preserved anywhere. 特别是,CR,LF和CRLF之间的差异不会保留在任何地方。

You might find it better to encode newlines in attributes as &lt;br /&gt; 您可能会发现将属性中的换行符编码为&lt;br /&gt;会更好&lt;br /&gt; (that is, the encoded version of <br /> ) and then decode them later. (即<br />的编码版本),然后再解码它们。

I've used JDom for this on the past. 我过去常常使用JDom。 It saves you a lot of trouble when decoding multilined attributes and really enhances XML parsing/writing on Java. 在解码多线程属性时,它可以为您节省很多麻烦,并且真正增强了Java上的XML解析/写入功能。 JDom is also compatible with Android development and it's really tiny (only one jar file). JDom也与Android开发兼容,而且它非常小(只有一个jar文件)。

https://github.com/hunterhacker/jdom https://github.com/hunterhacker/jdom

From the XML specifcation : 3.3.3 Attribute-Value Normalization. XML规范:3.3.3属性值规范化。 You will see that all white spaces are normallised to single spaces: 您将看到所有空格都被标准化为单个空格:

Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm. 在将属性的值传递给应用程序或检查其有效性之前,XML处理器必须通过应用下面的算法或通过使用其他方法来规范化属性值,以便传递给应用程序的值与生成的值相同通过算法。 All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way. 所有换行必须按照2.11行尾处理中的描述对#xA的输入进行归一化,因此该算法的其余部分对以这种方式归一化的文本进行操作。

Begin with a normalized value consisting of the empty string. 从包含空字符串的标准化值开始。

For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following: 对于非标准化属性值中的每个字符,实体引用或字符引用,从第一个开始并继续到最后一个,执行以下操作:

For a character reference, append the referenced character to the normalized value. 对于字符引用,将引用的字符附加到规范化值。

For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity. 对于实体引用,递归地将此算法的步骤3应用于实体的替换文本。

For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value. 对于空格字符(#x20,#xD,#xA,#x9),请在标准化值后附加空格字符(#x20)。

For another character, append the character to the normalized value. 对于另一个字符,将字符附加到规范化值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM