[英]Text encoding converts junk character in Play! 1.2.4 framework
Issue: Character encoding in Play! 问题: Play中的字符编码! 1.2.4 framework becomes.
1.2.4框架成为。
Context: We are trying to store the text "《我叫MT繁體版》台港澳專屬伺服器上線!" from input text field to mysql using Play! 上下文:我们正在尝试使用Play将文本“《我叫MT繁体版》台港澳专属伺服器上线!”存储在mysql中。 1.2.4 framework.
1.2.4框架。
Steps that we followed: 我们遵循的步骤:
1) UI to get the input from user. 1)从用户那里获取输入的UI。 just any lang text, so we tried Japneese Char.
lang语言文字,所以我们尝试了Japneese Char。 Note: page is set to UTF-8 character encoding.
注意:页面设置为UTF-8字符编码。
2) Post submission to Play! 2)发布提交后即可玩! controller, the controller just reads the input and stores it using Play!
控制器,控制器仅读取输入并使用Play进行存储! model.
模型。 snippet mentiond below,
下面提到的摘录
public static void text_create() throws UnsupportedEncodingException,
ParseException {
System.out.println("params :: text string value :: " + params.get("text"));
String oldString = params.get("text");
// Converting the input string(which is UTF-8 format) and parsing to Windown-1252
String newString = new String(oldString.getBytes(), "WINDOWS-1252");
// 1. passing encoded text to mysql.
// 2. TextCheck table and the column 'text' has encoding and collation format as UTF-8.
// 3. TextCheck > text column mentioned as String in model.
TextCheck a = new TextCheck(newString);
List<Object> text = TextCheck.TextList();
render(a,text);
}
It stores as TEXT value as "《æˆ'å «MTç¹ é«”ç‰ˆã€‹å °æ¸¯æ¾³å°ˆå±¬ä¼ºæœ å™¨ä¸Šç·šï¼ " 它以TEXT值的形式存储为“ã€ææˆ'å。«MTç¹Ã體版》åæ°æ¸¯æ¾³å°ˆå±¬ä¼ºæœå™¨ä¸Šç·šï¼。”
Problem is there are character in between value.
问题在于值之间存在字符。 when i read this raw data from mysql using other platforms like java, ruby or some other language it converts but makes those characters as junk.
当我使用java,ruby或其他语言从其他平台从mysql读取原始数据时,它会转换但会将那些字符变成垃圾。 just junk.
只是垃圾
Note: Interstingly when i read it from same Play! 注意:有趣的是,当我从同一个Play中阅读它时! framework.
框架。 it looks all fine even that junk characters were read correctly.
即使正确读取了垃圾字符,看起来也很好。
Question: Why those junk characters ? 问题:为什么那些垃圾字符?
The problem is the following line: 问题是以下行:
String newString = new String(oldString.getBytes(), "WINDOWS-1252");
This looks like nonsense to me. 对我来说这似乎是胡说八道。 Java stores all strings internally using UTF-16, so you can't adjust the encoding of a Java string in the manner you've attempted here.
Java使用UTF-16在内部存储所有字符串,因此您无法以此处尝试的方式调整Java字符串的编码。
The getBytes()
method returns the bytes of the string using the default platform encoding. getBytes()
方法使用默认平台编码返回字符串的字节。 You then covert these bytes into a new string using a (probably) different charset. 然后,您可以使用(可能)不同的字符集将这些字节转换为新的字符串。 The result is almost certain to be broken.
结果几乎可以肯定会被打破。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.