简体   繁体   English

如何让System.in输入流读取utf-8字符?

[英]How can I make System.in Input Stream read utf-8 characters?

This is my code: 这是我的代码:

public class MyTestClass {
    public static void main(String[] args) throws Exception {
        Scanner scanner = new Scanner(System.in);
        String s = scanner.nextLine();
        InputStream inputStream = System.in;
        int read = inputStream.read();
        System.out.println(read);
        System.out.println((char)read);
        System.out.println(s);
    }
}

And I input the letter ğ twice when I run the program. 当我运行程序时,我输入了两次字母ğ The console output will be: 控制台输出将是:

ğ
ğ
196
Ä
ğ

How can I see the correct letter instead of Ä ? 我怎样才能看到正确的字母而不是Ä Scanner seems to do the right thing. 扫描仪似乎做对了。

And actually, why does not this approach work? 实际上,为什么这种方法不起作用? What is wrong in here? 这里有什么问题?

The javadoc for InputStream#read() states InputStream#read()的javadoc状态

Reads the next byte of data from the input stream. 从输入流中读取下一个数据字节。

But as it turns out, the character ğ requires 2 bytes for representation in UTF-8. 但事实证明,角色ğ需要2个字节来表示UTF-8。 You therefore need to read two bytes. 因此,您需要读取两个字节。 You can use InputStream#read(byte[]) . 您可以使用InputStream#read(byte[])

byte[] buffer = new byte[2];
inputStream.read(buffer);

Once the byte array contains the appropriate bytes, you need to decode them in UTF-8. 一旦字节数组包含适当的字节,您需要以UTF-8解码它们。 You can do that with 你可以这样做

char val = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(buffer)).get();

The variable val will now contain the decoded character. 变量val现在将包含已解码的字符。

Note that some UTF-8 encoded character only need one byte for representation, so you should only do what we just did if you know how many bytes you need. 请注意,某些UTF-8编码字符只需要一个字节来表示,因此如果您知道需要多少字节,那么您应该只执行我们刚才所做的操作。 Otherwise, read everything and pass it to the decoder. 否则,读取所有内容并将其传递给解码器。

InputStream.read() retruns the next byte of data, which is a number between 0 and 255. InputStream.read()重新生成下一个数据byte ,这是一个0到255之间的数字。

Here, you are simply converting that byte into char , which in your case gives Ä . 在这里,您只是将该byte转换为char ,在您的情况下给出Ä

Scanner on the other hand, reads the whole string and that's why you see it properly output. 另一方面, Scanner读取整个字符串,这就是为什么你看到正确的输出。 I suggest you use Scanner over plain InputStream since it offers convenient methods for reading texts. 我建议你使用Scanner而不是简单的InputStream因为它提供了方便的阅读文本的方法。

Wrap the InputStream in an InputStreamReader . 包裹InputStreamInputStreamReader

int read = new InputStreamReader(System.in).read();
System.out.println((char) read); // prints 'ğ'

If necessary, you can pass a specific Charset to the reader's constructor, but by default, it will just use the default charset, which is probably correct. 如有必要,您可以将特定的Charset传递给阅读器的构造函数,但默认情况下,它只使用默认的字符集,这可能是正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM