简体   繁体   English

Java中如何处理特殊字符?

[英]How to handle special characters in Java?

I want to save a comment which is given by the user in DB as a CLOB.我想将用户在 DB 中给出的注释保存为 CLOB。 It's working fine.它工作正常。 Later I got issue with special characters.后来我遇到了特殊字符的问题。 If a user copy pastes the comment from a WordPad and it contains "single quote" or some special characters(they are bit different from usual) they are converting into reversed question mark or some square box .如果用户复制从写字板粘贴注释并且它包含“单引号”或一些特殊字符(它们与通常的有点不同),它们将转换为反向问号或一些方框 I tried to handle them by using below code.我尝试使用下面的代码来处理它们。 在此处输入图片说明

values[4] = new String(values[4].getBytes("ISO-8859-1"), "UTF-8");

But still I'm getting square boxes.但我仍然得到方形盒子。 After debugging the issue what I realized is, it is not able to handle a space .在调试问题后,我意识到它无法处理空格 Please see the attached image请看附件图片

Note: the comment length is 122 and it failed to handle only one space.注意:注释长度为122,仅处理一个空格失败。 I don't know what's wrong with that space.我不知道那个空间有什么问题。

Note that in java the encoding matters only when请注意,在 java 中,编码仅在

  1. doing some sort of (file-)IO or做某种(文件-)IO或
  2. converting characters to bytes将字符转换为字节

Java's String -objects are always encoded as UTF-16, so assuming that values is a String[] your code is doing the following: Java 的String对象始终编码为 UTF-16,因此假设valuesString[]您的代码将执行以下操作:

  1. Take the String values[4] as a set of characters.将 String values[4]作为一组字符。
  2. Transform each character to one byte using ISO8859-1-encoding使用 ISO8859-1 编码将每个字符转换为一个字节
  3. Use UTF8-encoding to convert these bytes to characters.使用 UTF8 编码将这些字节转换为字符。

eg the £ -character will be converted to the byte-value A3 but that single byte can not be converted back using UTF-8 since it could only be part of a 2-byte-sequence.例如, £字符将被转换为字节值A3但该单个字节不能使用 UTF-8 转换回来,因为它只能是 2 字节序列的一部分。

To sum it up: that codeline is completely broken, while using String -objects there is no need to think about any kind of encoding.总结一下:代码线完全被破坏了,而使用String -objects 则无需考虑任何类型的编码。 Where you have to take care of codepage issues is while converting to bytes, be it during I/O to a file or network-Stream or when converting to byte-arrays for encryption.您必须在转换为字节时处理代码页问题,无论是在 I/O 到文件或网络流期间,还是在转换为字节数组进行加密时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM