简体   繁体   English

文本编码可在Play中转换垃圾字符! 1.2.4框架

[英]Text encoding converts junk character in Play! 1.2.4 framework

Issue: Character encoding in Play! 问题: Play中的字符编码! 1.2.4 framework becomes. 1.2.4框架成为。

Context: We are trying to store the text "《我叫MT繁體版》台港澳專屬伺服器上線!" from input text field to mysql using Play! 上下文:我们正在尝试使用Play将文本“《我叫MT繁体版》台港澳专属伺服器上线!”存储在mysql中。 1.2.4 framework. 1.2.4框架。

Steps that we followed: 我们遵循的步骤:

1) UI to get the input from user. 1)从用户那里获取输入的UI。 just any lang text, so we tried Japneese Char. lang语言文字,所以我们尝试了Japneese Char。 Note: page is set to UTF-8 character encoding. 注意:页面设置为UTF-8字符编码。

2) Post submission to Play! 2)发布提交后即可玩! controller, the controller just reads the input and stores it using Play! 控制器,控制器仅读取输入并使用Play进行存储! model. 模型。 snippet mentiond below, 下面提到的摘录

public static void text_create() throws UnsupportedEncodingException,
        ParseException {
    System.out.println("params :: text string value :: "    + params.get("text"));

    String oldString = params.get("text");

    // Converting the input string(which is UTF-8 format) and parsing to Windown-1252
    String newString = new String(oldString.getBytes(), "WINDOWS-1252");        

    // 1. passing encoded text to mysql. 
    // 2. TextCheck table and the column 'text' has encoding and collation format as UTF-8.
    // 3. TextCheck > text column mentioned as String in model.
    TextCheck a = new TextCheck(newString);

    List<Object> text = TextCheck.TextList();
    render(a,text);
}

It stores as TEXT value as "《æˆ'å «MTç¹ é«”ç‰ˆã€‹å °æ¸¯æ¾³å°ˆå±¬ä¼ºæœ å™¨ä¸Šç·šï¼ " 它以TEXT值的形式存储为“ã€ææˆ'å。«MTç¹Ã體版》åæ°æ¸¯æ¾³å°ˆå±¬ä¼ºæœå™¨ä¸Šç·šï¼。”

Problem is there are character in between value. 问题在于值之间存在字符。 when i read this raw data from mysql using other platforms like java, ruby or some other language it converts but makes those characters as junk. 当我使用java,ruby或其他语言从其他平台从mysql读取原始数据时,它会转换但会将那些字符变成垃圾。 just junk. 只是垃圾

Note: Interstingly when i read it from same Play! 注意:有趣的是,当我从同一个Play中阅读它时! framework. 框架。 it looks all fine even that junk characters were read correctly. 即使正确读取了垃圾字符,看起来也很好。

Question: Why those junk characters ? 问题:为什么那些垃圾字符?

The problem is the following line: 问题是以下行:

String newString = new String(oldString.getBytes(), "WINDOWS-1252");

This looks like nonsense to me. 对我来说这似乎是胡说八道。 Java stores all strings internally using UTF-16, so you can't adjust the encoding of a Java string in the manner you've attempted here. Java使用UTF-16在内部存储所有字符串,因此您无法以此处尝试的方式调整Java字符串的编码。

The getBytes() method returns the bytes of the string using the default platform encoding. getBytes()方法使用默认平台编码返回字符串的字节。 You then covert these bytes into a new string using a (probably) different charset. 然后,您可以使用(可能)不同的字符集将这些字节转换为新的字符串。 The result is almost certain to be broken. 结果几乎可以肯定会被打破。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM