简体   繁体   English

使用 JAXB 在 XML 输出中避免特殊字符

[英]Avoiding special characters in the XML output using JAXB

I am reading tweets and forming an XML out of it, for which I am using JAXB Marshaller and UTF-8 encoding.我正在阅读推文并从中形成一个 XML,为此我使用了 JAXB Marshaller 和 UTF-8 编码。

JAXB Marshaller setting is: JAXB Marshaller 设置为:

JAXBContext jaxbContext;
StringWriter writer = new StringWriter();

jaxbContext = JAXBContext.newInstance(obj.getClass());
Marshaller m = jaxbContext.createMarshaller();  
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
m.marshal(obj, writer);

Here, obj is my class object which contains tweet text and other information.这里, obj是我的类对象,其中包含推文文本和其他信息。

My problem is, the generated XML contains special characters like:我的问题是,生成的 XML 包含特殊字符,如:

> x85, x93, xAQ

Sample Output XML:示例输出 XML:

   <tweet>
        <id>500923859663872000</id>
        <createdAt>2014-08-17T14:05:29+05:30</createdAt>
        **<text>Ԁhughwizzy: 55% of all '14-'15 @PremierLeague players will wear @Nike** Boots. (@adidas 35%, @Puma 5%). http://t.co/VHit1Es7KlԠ@Yup_Yup9</text>
        <langISOCode>en</langISOCode>
        <place>NA</place>
        <favouriteCount>0</favouriteCount>
        <retweetCount>0</retweetCount>
        <isPossiblySensitive>false</isPossiblySensitive>
        <user>
            <id>39481349</id>
            <createdAt>2009-05-12T17:12:37+05:30</createdAt>
            <location>NA</location>
            <followersCount>281</followersCount>
            <listedCount>4</listedCount>
            <preferredLang>en</preferredLang>
            <isVerified>false</isVerified>
            <isTranslator>false</isTranslator>
        </user>
    </tweet>

I found that these are UTF-8 encoded characters, but it makes my XML invalid.我发现这些是 UTF-8 编码的字符,但它使我的 XML 无效。

Is there a way to avoid these characters in the generated XML.有没有办法在生成的 XML 中避免这些字符。

You can Base64Encode the string before you set it to your variable in your object.在将字符串设置为对象中的变量之前,您可以对字符串进行 Base64Encode。 If you want to display the content in the browser you can extract the xml value for that text , then you can display after decoding it.如果要在浏览器中显示内容,可以提取该文本的xml值,然后解码后显示。

obj.setText(DatatypeConverter.printBase64Binary(byte[] your_tweet_as_a_byte_array));
// serialize it
//in the front end extract the xml value for your text using javascript. lets say xml contains like this:=>  <text>adiufgdb12bsre==<text>
// you can use the following code in the javascript to decode it.
atob(extracted_encoded _string) // will give you the decoded string.

Using this way you can avoid creating an invalid xml, by avoiding xml not supported unicode characters.使用这种方式可以避免创建无效的 xml,避免 xml 不支持的 unicode 字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM