简体   繁体   English

从Java的System.in中获取日语或中文输入

[英]Taking Japanese or Chinese input from System.in Java

I am trying to take in Japanese characters for a little echo server I wrote. 我试图将日语字符用于我编写的一个小回显服务器。 The problem is that when I get the characters from System.in (via anything, scanner, InputStream , you name it) They always come in as garbage. 问题是,当我从System.in(通过任何东西,扫描仪, InputStream命名)中得到字符时,它们总是作为垃圾进来。 I even tried using the 我什至尝试使用

message = new String(bufferedReader.readLine().getBytes("UTF8");  

in order to try to get the bytes to come in as Unicode. 为了尝试获取字节以Unicode格式输入。

When I print a message from the server ようこそ (welcome in japanese) it comes up fine, the problem only exists when taking user input. 当我从服务器print(日文欢迎)打印消息时,它显示很好,问题仅在接受用户输入时存在。

The console is set up to use UTF8 in eclipse. 控制台已设置为在Eclipse中使用UTF8。

Here is a small test program I wrote to make sure it was the input from System.in 这是我编写的一个小型测试程序,以确保它是来自System.in的输入。

the input and output are 输入和输出是

よ
よ

And here is the code 这是代码

public class TestUnicode {

public static void main(String[] args) throws IOException
{
    BufferedReader stdIn = new BufferedReader(new InputStreamReader(System.in, "UTF8"));
    String message = stdIn.readLine();
    System.out.println(message);
}

} }

public class Client {

public static void main(String[] args) throws IOException 
{
    Socket serverSocket = null;

    try
    {
        serverSocket = new Socket("192.168.1.127", 3000); //connect to myself at port 3000
    }
    catch(IOException e)
    {
        System.out.println(e);
        System.exit(1);
    }

    BufferedReader in = null;
    PrintStream out = null;     
    try //create in and out to write and read from echo
    {
        in = new BufferedReader(new InputStreamReader(serverSocket.getInputStream()));
        out = new PrintStream(serverSocket.getOutputStream(), true);
    }
    catch(IOException e)
    {
        serverSocket.close();
        System.out.println(e);
        System.exit(1);
    }

    String message = null;
    message = in.readLine();
    System.out.println(message); //print out the welcome message

    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
    //create a new buffered reader from my input

    try
    {
        while(true)
        {
            message = bufferedReader.readLine();
            out.println(message); //send a line to the server
            if(message.equals("quit"))
            {
                System.out.println(in.readLine());
                break;
            }
            System.out.println(in.readLine()); //get it back and print it               
        }

        System.out.println("Quiting client...");
    }
    catch(IOException e)
    {
        in.close();
        out.close();
        serverSocket.close();
        System.out.println(e);
        System.exit(1);
    }

    in.close();
    out.close();
    serverSocket.close();
}
}

I presume you are using Windows. 我想您正在使用Windows。
The problem here is the fact, that DOS prompt uses completely different character encoding than UTF-8. 这里的问题是事实,DOS提示符使用与UTF-8完全不同的字符编码。 In case of Japanese it would be Shift-JIS, so trying to read that out with UTF-8 InputStream will not work. 如果是日语,则为Shift-JIS,因此尝试使用UTF-8 InputStream读出该信息将不起作用。

Fortunately, there is a hope. 幸运的是,有希望。 Instead of using System.in you could (and should) use System.console() . 可以(并且应该)使用System.console()而不是使用System.in It will return an instance of Console class with the valid character encoding conversion in place. 它将返回Console类的一个实例,该实例具有有效的字符编码转换。 However, you must be aware that trying to debug this out of IDE (especially Eclipse) won't work, as it does not attach Console. 但是,您必须意识到尝试从IDE(尤其是Eclipse)中进行调试是行不通的,因为它没有附加控制台。 Oops. 哎呀。

The corrected code (that I am sure to work, but I haven't tested it): 更正后的代码(我确定可以工作,但尚未测试):

public class TestUnicode {

public static void main(String[] args) throws IOException
{
Console console = System.console();
String message = console.readLine();
console.writer().println(message);
}

Please note that you need to use Console also to print messages out. 请注意,您还需要使用Console打印消息。 Why? 为什么? It's just because you need to convert character encoding both ways. 只是因为您需要同时转换字符编码。 The DOS prompt still remains in the legacy encoding, and there is no way to change that. DOS提示符仍然保留在旧版编码中,无法更改它。

When you create your InputStreamReader, you should specify the charset to use: 创建InputStreamReader时,应指定要使用的字符集:

new InputStreamReader(System.in, "UTF-8")

This also applies to your socket streams. 这也适用于您的套接字流。

If you don't do that, then the default charset (encoding) will be used. 如果您不这样做,那么将使用默认字符集(编码)。 You can also change the default by adding -Dfile.encoding=UTF-8 as a VM argument. 您还可以通过添加-Dfile.encoding=UTF-8作为VM参数来更改默认值。

Regarding your test program, System.out.println also uses the default charset, so it can mess up your string even if it was read correctly. 对于您的测试程序,System.out.println也使用默认字符集,因此即使正确读取字符串也可能使您的字符串混乱。 So unless you change the default charset, you can use something like this to print out the string: 因此,除非更改默认字符集,否则可以使用类似以下的方法来打印字符串:

final OutputStreamWriter w = new OutputStreamWriter(System.out, "UTF-8");
w.write(message);
w.flush();

I modified your class this way 我这样修改了你的课

public class TestUnicode {

    /**
     * @param args
     */
    public static void main(String[] args) {
        BufferedReader stdIn = null;
        try {
            stdIn = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
        } catch (UnsupportedEncodingException e1) {
            e1.printStackTrace();
        }
        String message = "";
        try {
            message = stdIn.readLine();
        } catch (IOException e) {
            e.printStackTrace();
        }
        try {
            System.out.println(new String(message.getBytes("UTF-8")));
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
    }
}

and run it in console and got the desired output. 并在控制台中运行它,并获得所需的输出。

So in your case, I'd suggest you place the character encoding part in your BufferedReader and PrintStream 因此,根据您的情况,建议您将字符编码部分放在BufferedReader和PrintStream中

Note: I tried running it using an IDE and outputs '?' 注意:我尝试使用IDE运行它并输出'?' for that Japanese character, I recommend running it in a console. 对于该日语字符,我建议在控制台中运行它。

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM