[英]How to handle special characters in Java?
I want to save a comment which is given by the user in DB as a CLOB.我想将用户在 DB 中给出的注释保存为 CLOB。 It's working fine.
它工作正常。 Later I got issue with special characters.
后来我遇到了特殊字符的问题。 If a user copy pastes the comment from a WordPad and it contains "single quote" or some special characters(they are bit different from usual) they are converting into reversed question mark or some square box .
如果用户复制从写字板粘贴注释并且它包含“单引号”或一些特殊字符(它们与通常的有点不同),它们将转换为反向问号或一些方框。 I tried to handle them by using below code.
我尝试使用下面的代码来处理它们。
values[4] = new String(values[4].getBytes("ISO-8859-1"), "UTF-8");
But still I'm getting square boxes.但我仍然得到方形盒子。 After debugging the issue what I realized is, it is not able to handle a space .
在调试问题后,我意识到它无法处理空格。 Please see the attached image
请看附件图片
Note: the comment length is 122 and it failed to handle only one space.注意:注释长度为122,仅处理一个空格失败。 I don't know what's wrong with that space.
我不知道那个空间有什么问题。
Note that in java the encoding matters only when请注意,在 java 中,编码仅在
Java's String
-objects are always encoded as UTF-16, so assuming that values
is a String[]
your code is doing the following: Java 的
String
对象始终编码为 UTF-16,因此假设values
是String[]
您的代码将执行以下操作:
values[4]
as a set of characters.values[4]
作为一组字符。 eg the £
-character will be converted to the byte-value A3
but that single byte can not be converted back using UTF-8 since it could only be part of a 2-byte-sequence.例如,
£
字符将被转换为字节值A3
但该单个字节不能使用 UTF-8 转换回来,因为它只能是 2 字节序列的一部分。
To sum it up: that codeline is completely broken, while using String
-objects there is no need to think about any kind of encoding.总结一下:代码线完全被破坏了,而使用
String
-objects 则无需考虑任何类型的编码。 Where you have to take care of codepage issues is while converting to bytes, be it during I/O to a file or network-Stream or when converting to byte-arrays for encryption.您必须在转换为字节时处理代码页问题,无论是在 I/O 到文件或网络流期间,还是在转换为字节数组进行加密时。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.