简体   繁体   中英

Unmarshaling XML CDATA strings as literals using JAXB

Consider the following simple XML string:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<example>
    <value name="test">abcd</value>
</example>

The following code defines 2 Java classes ( Example.class and Value.class ) that can be used to produce the above XML output, for the string value abcd :

@XmlRootElement(name = "example")
public class Example {
  private Value value;
  private Example() {}
  public Value getValue() { return value; }

  public void setValue(Value value) { this.value = value; }

  @XmlAccessorType(XmlAccessType.FIELD)
  private static final class Value {
    @XmlValue
    private String value;
    @XmlAttribute(name="name")
    private String name;
    public Value() {}
    public String getValue() { return value; }


     public void setValue(String value) { this.value = value; }
        public String getName() { return name; }
        public void setName(String name) { this.name = name; }
      }
  }

To unmarshall (deserialize) the above XML string into the original Example object it was produced from, one can use the following code:

  public static void main(String[] args) throws Exception {
    JAXBContext context = JAXBContext.newInstance(Example.class);
    String input = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n" +
      "<example>\n" +
      "    <value name=\"test\">abcd</value>\n" +
      "</example>";
    Unmarshaller um = context.createUnmarshaller();
    Example v = (Example)um.unmarshal(new ByteArrayInputStream(input.getBytes()));
    System.out.println(v.getValue().getValue());
  }

However, if the string value in the above main() method is changed from abcd to abcd<> or any other CDATA string, the unmarshaller throws an exception:

org.xml.sax.SAXParseException; The content of elements must consist of well-formed character data or markup.

A proposed solution is to use a custom DOMHandler with an XmlAnyElement annotation, but it does not seem to work.

Is there any way of deserializing the abcd<> string as a literal (ie, without enclosing it in a CDATA section)?

Is there any way of deserializing the abcd<> string as a literal (ie, without enclosing it in a CDATA section)?

No, because your XML won't be valid.

The problem is to try to unmarshal an unknown input, therefore such preprocessing is not possible

You will need to ensure your inputs are valid XML to use any XML tool.

Invalid XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<example>
    <value name="test">abcd<></value>
</example>

When you tried to parse the above XML you got the following exception. The exception comes from the underlying parser used by JAXB. XML parsers rely on angle brackets representing element tags. Special care must be taken when including them in element content.

org.xml.sax.SAXParseException; The content of elements must consist of well-formed character data or markup.

Made Valid Using Parsed Character Data

One way to make the XML valid is to replace < with &lt; and > with &gt; . Your JAXB implementation will unmarshal the XML value abcd&lt;&gt; to the String value abcd<> .

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<example>
    <value name="test">abcd&lt;&gt;</value>
</example>

Made Valid Using Character Data

Another way to make the XML valid is to wrap the character content in a CDATA block. JAXB will unmarshal <![CDATA[abcd<>]]> as abcd<> . On marshaling it will put the content to XML as abcd&lt;&gt; .

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<example>
    <value name="test"><![CDATA[abcd<>]]></value>
</example>

唯一的方法是用字符引用&qout;&amp;等替换特殊字符"&'<>

我认为您将必须在下面的xml字符串中替换特殊字符。

abcd&lt;&gt; instead of `abcd<>`

Have you tried creating a method that appends CDATA tags before unmarshalling?

 public String addCdataTags (String yourString){
        return "<![CDATA[" + yourString + "]]>"
     }

This should take care of those CDATA Strings mistakenly seen as xml element tags

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM