java中的垃圾字符删除

Question

In text field if i copy from word , junk character get inserted. 在文本字段中如果我从单词复制，则插入垃圾字符。 While posting parameters from jsp page it remains fine. 从jsp页面发布参数时，它仍然没问题。 But while getting the parameter in java it converts into junk. 但是在java中获取参数时，它会转换为垃圾。 I have used the following code to eliminate junk before insertion. 我在插入之前使用了以下代码来消除垃圾。 I am using mysql database. 我正在使用mysql数据库。 (JBOSS 5.1 GA server) （JBOSS 5.1 GA服务器）

String outputEncoding = "UTF-8";

Charset charsetOutput = Charset.forName(outputEncoding);
CharsetEncoder encoder = charsetOutput.newEncoder();
byte[] bufferToConvert = userText.getBytes();
CharsetDecoder decoder =  (CharsetDecoder) charsetOutput.newDecoder();
try {
    CharBuffer cbuf = decoder.decode(ByteBuffer.wrap(bufferToConvert));
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(cbuf));
    userText = decoder.decode(bbuf).toString();
} catch (CharacterCodingException e) {
    e.printStackTrace();
}

but I am still getting junk character for single quote('') and double quotes(""). 但我仍然在单引号（''）和双引号（“”）中获得垃圾字符。 I need the string in UTF-8. 我需要UTF-8中的字符串。 Can anyone suggest where i may be wrong? 任何人都可以建议我可能错在哪里？

Example: Input -"esgh”. output - â??esghâ?? : Wanted Output - "esgh”. 示例：输入 - “esgh”。输出 - â??esghâ??：想要输出 - “esgh”。

Answer 1

You have to swap around the encode and decode calls. 你必须交换编码和解码调用。 Plus; 加; you are decoding twice, for only one encoding! 你正在解码两次，只有一个编码！

You wrote: 你写了：

CharBuffer cbuf = decoder.decode(ByteBuffer.wrap(bufferToConvert));
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(cbuf));
userText = decoder.decode(bbuf).toString();

But, obviously, it has to be: 但是，显然，它必须是：

ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(userText));
CharBuffer cbuf = decoder.decode(bbuf);
userText = cbuf.toString();

First, you have to encode your text, then decode the encoded result. 首先，您必须对文本进行编码，然后对编码结果进行解码。

Answer 2

If you copy text from Microsoft Word, it has the 'Smart Quotes' feature that can and will trip up sometimes when encoding/decoding. 如果您从Microsoft Word复制文本，它具有“智能引号”功能，有时在编码/解码时会跳闸。 Try using encoding Windows-1252 as source encoding. 尝试使用Windows-1252编码作为源编码。 Also, I would suggest using String#getBytes(String) and String#String(byte[],Charset) for the conversions, no need to mess with buffers at this level. 另外，我建议使用String#getBytes(String)和String#String(byte[],Charset)进行转换，不需要在此级别使用缓冲区。

Answer 3

The answer by Martijn Courteaux should give you the expected output. Martijn Courteaux的答案应该会给你预期的输出。 But once try with the server setup CHARACTER and COLLATION .Set to UTF-8 . 但是一旦尝试使用服务器设置CHARACTER和COLLATION 。设置为UTF-8 。

I hope it will work. 我希望它能奏效。

Answer 4

Please check if web/application server is sending the correct data. 请检查Web /应用程序服务器是否正在发送正确的数据。

Which web/application server are you using? 您使用的是哪个Web /应用程序服务器？

Are you using a simple text field or any other? 您使用的是简单的文本字段还是其他任何字段？

java中的垃圾字符删除

问题描述

4 个解决方案

解决方案1
5 2012-07-24 10:32:19

解决方案2
1 2012-08-05 21:59:48

解决方案3
0 2012-07-30 13:04:08

解决方案4
0 2012-08-04 16:20:32

java中的垃圾字符删除

问题描述

4 个解决方案

解决方案1 5 2012-07-24 10:32:19

解决方案2 1 2012-08-05 21:59:48

解决方案3 0 2012-07-30 13:04:08

解决方案4 0 2012-08-04 16:20:32

解决方案1
5 2012-07-24 10:32:19

解决方案2
1 2012-08-05 21:59:48

解决方案3
0 2012-07-30 13:04:08

解决方案4
0 2012-08-04 16:20:32