[英]How can I make System.in Input Stream read utf-8 characters?
This is my code: 这是我的代码:
public class MyTestClass {
public static void main(String[] args) throws Exception {
Scanner scanner = new Scanner(System.in);
String s = scanner.nextLine();
InputStream inputStream = System.in;
int read = inputStream.read();
System.out.println(read);
System.out.println((char)read);
System.out.println(s);
}
}
And I input the letter ğ
twice when I run the program. 当我运行程序时,我输入了两次字母
ğ
。 The console output will be: 控制台输出将是:
ğ
ğ
196
Ä
ğ
How can I see the correct letter instead of Ä
? 我怎样才能看到正确的字母而不是
Ä
? Scanner seems to do the right thing. 扫描仪似乎做对了。
And actually, why does not this approach work? 实际上,为什么这种方法不起作用? What is wrong in here?
这里有什么问题?
The javadoc for InputStream#read()
states InputStream#read()
的javadoc状态
Reads the next byte of data from the input stream.
从输入流中读取下一个数据字节。
But as it turns out, the character ğ
requires 2 bytes for representation in UTF-8. 但事实证明,角色
ğ
需要2个字节来表示UTF-8。 You therefore need to read two bytes. 因此,您需要读取两个字节。 You can use
InputStream#read(byte[])
. 您可以使用
InputStream#read(byte[])
。
byte[] buffer = new byte[2];
inputStream.read(buffer);
Once the byte array contains the appropriate bytes, you need to decode them in UTF-8. 一旦字节数组包含适当的字节,您需要以UTF-8解码它们。 You can do that with
你可以这样做
char val = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(buffer)).get();
The variable val
will now contain the decoded character. 变量
val
现在将包含已解码的字符。
Note that some UTF-8 encoded character only need one byte for representation, so you should only do what we just did if you know how many bytes you need. 请注意,某些UTF-8编码字符只需要一个字节来表示,因此如果您知道需要多少字节,那么您应该只执行我们刚才所做的操作。 Otherwise, read everything and pass it to the decoder.
否则,读取所有内容并将其传递给解码器。
InputStream.read() retruns the next byte
of data, which is a number between 0 and 255. InputStream.read()重新生成下一个数据
byte
,这是一个0到255之间的数字。
Here, you are simply converting that byte
into char
, which in your case gives Ä
. 在这里,您只是将该
byte
转换为char
,在您的情况下给出Ä
。
Scanner
on the other hand, reads the whole string and that's why you see it properly output. 另一方面,
Scanner
读取整个字符串,这就是为什么你看到正确的输出。 I suggest you use Scanner over plain InputStream
since it offers convenient methods for reading texts. 我建议你使用Scanner而不是简单的
InputStream
因为它提供了方便的阅读文本的方法。
Wrap the InputStream
in an InputStreamReader
. 包裹
InputStream
在InputStreamReader
。
int read = new InputStreamReader(System.in).read();
System.out.println((char) read); // prints 'ğ'
If necessary, you can pass a specific Charset
to the reader's constructor, but by default, it will just use the default charset, which is probably correct. 如有必要,您可以将特定的
Charset
传递给阅读器的构造函数,但默认情况下,它只使用默认的字符集,这可能是正确的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.