简体   繁体   English

java中的垃圾字符删除

[英]Junk character removal in java

In text field if i copy from word , junk character get inserted. 在文本字段中如果我从单词复制,则插入垃圾字符。 While posting parameters from jsp page it remains fine. 从jsp页面发布参数时,它仍然没问题。 But while getting the parameter in java it converts into junk. 但是在java中获取参数时,它会转换为垃圾。 I have used the following code to eliminate junk before insertion. 我在插入之前使用了以下代码来消除垃圾。 I am using mysql database. 我正在使用mysql数据库。 (JBOSS 5.1 GA server) (JBOSS 5.1 GA服务器)

String outputEncoding = "UTF-8";

Charset charsetOutput = Charset.forName(outputEncoding);
CharsetEncoder encoder = charsetOutput.newEncoder();
byte[] bufferToConvert = userText.getBytes();
CharsetDecoder decoder =  (CharsetDecoder) charsetOutput.newDecoder();
try {
    CharBuffer cbuf = decoder.decode(ByteBuffer.wrap(bufferToConvert));
    ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(cbuf));
    userText = decoder.decode(bbuf).toString();
} catch (CharacterCodingException e) {
    e.printStackTrace();
}

but I am still getting junk character for single quote('') and double quotes(""). 但我仍然在单引号('')和双引号(“”)中获得垃圾字符。 I need the string in UTF-8. 我需要UTF-8中的字符串。 Can anyone suggest where i may be wrong? 任何人都可以建议我可能错在哪里?

Example: Input -"esgh”. output - â??esghâ?? : Wanted Output - "esgh”. 示例:输入 - “esgh”。输出 - â??esghâ??:想要输出 - “esgh”。

You have to swap around the encode and decode calls. 你必须交换编码和解码调用。 Plus; 加; you are decoding twice, for only one encoding! 你正在解码两次,只有一个编码!

You wrote: 你写了:

CharBuffer cbuf = decoder.decode(ByteBuffer.wrap(bufferToConvert));
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(cbuf));
userText = decoder.decode(bbuf).toString();

But, obviously, it has to be: 但是,显然,它必须是:

ByteBuffer bbuf = encoder.encode(CharBuffer.wrap(userText));
CharBuffer cbuf = decoder.decode(bbuf);
userText = cbuf.toString();

First, you have to encode your text, then decode the encoded result. 首先,您必须对文本进行编码,然后对编码结果进行解码。

If you copy text from Microsoft Word, it has the 'Smart Quotes' feature that can and will trip up sometimes when encoding/decoding. 如果您从Microsoft Word复制文本,它具有“智能引号”功能,有时在编码/解码时会跳闸。 Try using encoding Windows-1252 as source encoding. 尝试使用Windows-1252编码作为源编码。 Also, I would suggest using String#getBytes(String) and String#String(byte[],Charset) for the conversions, no need to mess with buffers at this level. 另外,我建议使用String#getBytes(String)String#String(byte[],Charset)进行转换,不需要在此级别使用缓冲区。

The answer by Martijn Courteaux should give you the expected output. Martijn Courteaux的答案应该会给你预期的输出。 But once try with the server setup CHARACTER and COLLATION .Set to UTF-8 . 但是一旦尝试使用服务器设置CHARACTERCOLLATION 。设置为UTF-8

I hope it will work. 我希望它能奏效。

Please check if web/application server is sending the correct data. 请检查Web /应用程序服务器是否正在发送正确的数据。

Which web/application server are you using? 您使用的是哪个Web /应用程序服务器?

Are you using a simple text field or any other? 您使用的是简单的文本字段还是其他任何字段?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM