简体繁体 English

Java XMLStreamReader将“转换为”

[英]Java XMLStreamReader converts " to "

原文 2018-04-26 08:15:31 4 1 java/ xmlstreamreader

Suppose, we have the following XML 假设我们有以下XML

<Test> <Description> "Hi" </Description> </Test>

I load this XML using XMLStreamReader and parse using the reader object. 我使用XMLStreamReader加载此XML，并使用reader对象进行解析。 When I print the characters encountered while parsing using the getText() of the reader, I see that the " 当使用阅读器的getText（）打印在解析时遇到的字符时，我看到" is printed as ". Although, "(double-quotes) need not have been escaped to " 尽管“（双引号）不必转义为" in the first place, I would like to know why the parser automatically does this conversion when the escaping is not required. 首先，我想知道为什么解析器在不需要转义时会自动进行此转换。 For instance, <, > and & 例如， <, > and & <, > and & are preserved, without which the resulting XML would be invalid. 被保留，否则，生成的XML将无效。 However, this is not the case for " and ' 但是，“情况" and '并非如此" and ' " and ' . 。 I have to save the description the same way I receive it. 我必须按照接收说明的相同方式保存说明。 Is it possible to do that with the XMLStreamReader API? 使用XMLStreamReader API可以做到这一点吗？

1 个解决方案

I have to save the description the same way I receive it. 我必须按照接收说明的相同方式保存说明。

You should not. 你不应该。 As far as XML is concerned, " 就XML而言， " or " are the exact same thing, and therefore it cannot matter to you whether you obtain one or the other. 或"是完全相同的事物，因此，无论您获得一件还是另一件都无所谓。

As for why it's happening, it is an XML parser's job to unescape escaped characters so that they present you with the data they mean. 至于发生这种情况的原因，这是XML解析器的工作，它可以对转义的字符进行转义，以便它们为您提供所要表示的数据。 It also unescapes < 它也不会转义< and so on. 等等。 However, when the text such obtained is then serialized back into XML, the serializer will escape again characters such as < because it's required by XML, but it won't bother escaping " because that's not necessary. 但是，当将这样获得的文本序列化回XML时，序列化程序将再次转义诸如<之类的字符，因为XML要求它，但不会麻烦转义"因为这不是必需的。

When you go through a process of parsing XML, then serializing again, you cannot have a concept of "preserving" the escapes as-is. 当您经历解析XML的过程，然后再次进行序列化时，您将无法拥有“保留”原义转义的概念。 That's inherently lost in conversion. 这是转换中固有的损失。 The parser just is not in charge of preserving this unneeded info. 解析器只是不负责保留此不需要的信息。 However, if you wish your " to always be escaped to " in the resulting XML, your XML serializer might have an option for that (you gave no details about what you're using, so I can't tell you definitely whether you can or cannot.) 但是，如果你希望你的"总是被逃到"在生成的XML，你的XML序列化可能对于一个选项（你没有给你正在使用的细节，所以我不能告诉你，肯定你是否可以或不可以。）