简体   繁体   中英

Taking Japanese or Chinese input from System.in Java

I am trying to take in Japanese characters for a little echo server I wrote. The problem is that when I get the characters from System.in (via anything, scanner, InputStream , you name it) They always come in as garbage. I even tried using the

message = new String(bufferedReader.readLine().getBytes("UTF8");  

in order to try to get the bytes to come in as Unicode.

When I print a message from the server ようこそ (welcome in japanese) it comes up fine, the problem only exists when taking user input.

The console is set up to use UTF8 in eclipse.

Here is a small test program I wrote to make sure it was the input from System.in

the input and output are

よ
よ

And here is the code

public class TestUnicode {

public static void main(String[] args) throws IOException
{
    BufferedReader stdIn = new BufferedReader(new InputStreamReader(System.in, "UTF8"));
    String message = stdIn.readLine();
    System.out.println(message);
}

}

public class Client {

public static void main(String[] args) throws IOException 
{
    Socket serverSocket = null;

    try
    {
        serverSocket = new Socket("192.168.1.127", 3000); //connect to myself at port 3000
    }
    catch(IOException e)
    {
        System.out.println(e);
        System.exit(1);
    }

    BufferedReader in = null;
    PrintStream out = null;     
    try //create in and out to write and read from echo
    {
        in = new BufferedReader(new InputStreamReader(serverSocket.getInputStream()));
        out = new PrintStream(serverSocket.getOutputStream(), true);
    }
    catch(IOException e)
    {
        serverSocket.close();
        System.out.println(e);
        System.exit(1);
    }

    String message = null;
    message = in.readLine();
    System.out.println(message); //print out the welcome message

    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
    //create a new buffered reader from my input

    try
    {
        while(true)
        {
            message = bufferedReader.readLine();
            out.println(message); //send a line to the server
            if(message.equals("quit"))
            {
                System.out.println(in.readLine());
                break;
            }
            System.out.println(in.readLine()); //get it back and print it               
        }

        System.out.println("Quiting client...");
    }
    catch(IOException e)
    {
        in.close();
        out.close();
        serverSocket.close();
        System.out.println(e);
        System.exit(1);
    }

    in.close();
    out.close();
    serverSocket.close();
}
}

I presume you are using Windows.
The problem here is the fact, that DOS prompt uses completely different character encoding than UTF-8. In case of Japanese it would be Shift-JIS, so trying to read that out with UTF-8 InputStream will not work.

Fortunately, there is a hope. Instead of using System.in you could (and should) use System.console() . It will return an instance of Console class with the valid character encoding conversion in place. However, you must be aware that trying to debug this out of IDE (especially Eclipse) won't work, as it does not attach Console. Oops.

The corrected code (that I am sure to work, but I haven't tested it):

public class TestUnicode {

public static void main(String[] args) throws IOException
{
Console console = System.console();
String message = console.readLine();
console.writer().println(message);
}

Please note that you need to use Console also to print messages out. Why? It's just because you need to convert character encoding both ways. The DOS prompt still remains in the legacy encoding, and there is no way to change that.

When you create your InputStreamReader, you should specify the charset to use:

new InputStreamReader(System.in, "UTF-8")

This also applies to your socket streams.

If you don't do that, then the default charset (encoding) will be used. You can also change the default by adding -Dfile.encoding=UTF-8 as a VM argument.

Regarding your test program, System.out.println also uses the default charset, so it can mess up your string even if it was read correctly. So unless you change the default charset, you can use something like this to print out the string:

final OutputStreamWriter w = new OutputStreamWriter(System.out, "UTF-8");
w.write(message);
w.flush();

I modified your class this way

public class TestUnicode {

    /**
     * @param args
     */
    public static void main(String[] args) {
        BufferedReader stdIn = null;
        try {
            stdIn = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
        } catch (UnsupportedEncodingException e1) {
            e1.printStackTrace();
        }
        String message = "";
        try {
            message = stdIn.readLine();
        } catch (IOException e) {
            e.printStackTrace();
        }
        try {
            System.out.println(new String(message.getBytes("UTF-8")));
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
    }
}

and run it in console and got the desired output.

So in your case, I'd suggest you place the character encoding part in your BufferedReader and PrintStream

Note: I tried running it using an IDE and outputs '?' for that Japanese character, I recommend running it in a console.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM