简体   繁体   English

具有特殊字符的xml,编码为u​​tf-8

[英]xml with special character, encoding utf-8

I have a few simple questions, because I got confused reading all difference responses. 我有几个简单的问题,因为我在阅读所有差异回答时感到困惑。

1) If I have an xml with prolog: <?xml version="1.0" encoding="utf-8" ?> and I'm going to unmarshall it with Java (for example: JaXB). 1)如果我有一个带有序言的xml: <?xml version="1.0" encoding="utf-8" ?> ,我将用Java解组(例如:JaXB)。 I suppose, that I can't put CROSS OF LORRAINE ( http://www.fileformat.info/info/unicode/char/2628/index.htm ) inside, but I can put "\☨", correct? 我想我不能在里面放CROSS OF LORRAINE( http://www.fileformat.info/info/unicode/char/2628/index.htm ),但是可以放“ \\ u2628”,对吗?

2) I've also heard that UTF-8 doesn't contain it, but anything in Unicode can be saved with encoding UTF-8 (or UTF-16), and here is an example from this page: 2)我还听说过UTF-8不包含它,但是Unicode中的任何内容都可以使用UTF-8(或UTF-16)编码保存,这是此页面上的示例:

UTF-8 (hex) 0xE2 0x98 0xA8 (e298a8) UTF-8(十六进制)0xE2 0x98 0xA8(e298a8)

Is my reasoning correct? 我的推理正确吗? Can I use this form and put it in the xml with utf-8 encoding? 我可以使用此表单并将其以utf-8编码格式放入xml吗?

If your prolog specifying utf-8 encoding for xml: 如果您的序言为xml指定utf-8编码:

<?xml version="1.0" encoding="utf-8" ?>

then you can use utf-8 characters directly, or you can encode them as &#9768; 那么您可以直接使用utf-8字符,也可以将它们编码为&#9768;

It should be absolutely fine - UTF-8 can encode any Unicode character. 绝对没问题-UTF-8可以编码任何Unicode字符。

XML has some restrictions around control characters (U+0000 to U+001F) but U+2628 should be fine. XML对控制字符有一些限制(U + 0000到U + 001F),但是U + 2628应该可以。

(Personally I prefer to go to unicode.org for definitive code charts, but U+2628 definitely appears here .) (就我个人而言,我更喜欢去unicode.org以获得最终的代码表,但是U + 2628肯定出现在这里 。)

You shouldn't need to worry about the UTF-8 side of things - you should be able to put the character in your data directly, and let JAXB do the encoding. 您不必担心UTF-8方面的问题-您应该能够直接将字符放入数据中,并让JAXB进行编码。

1 more addition... 另外1个...

just specifying the encoding in the prolog is not sufficient. 仅在序言中指定编码是不够的。 u need to make sure the content is serialized using correct encoding. 您需要确保使用正确的编码对内容进行序列化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM