简体   繁体   English

InputStreamReader是否应与appendCodePoint一起使用?

[英]Should InputStreamReader be used with appendCodePoint?

It is a common pattern in Java to read characters from a file with InputStreamReader and append them to a StringBuilder; 在Java中,使用InputStreamReader从文件读取字符并将其附加到StringBuilder是一种常见的模式。 the obvious way to do it is like: 显而易见的方法是:

int c = reader.read();
sb.append((char)c);

However, supposing the file (assuming we specified UTF-8 encoding if it makes a difference) were to contain a character (strictly speaking a code point) that doesn't fit in 16 bits. 但是,假设文件(假设我们指定了UTF-8编码,如果有区别的话)将包含一个不适合16位的字符(严格来说是一个代码点)。 Would the reader return this as a single 32-bit code point instead of a pair of 16-bit chars? 读者会将它作为单个32位代码点而不是一对16位字符返回吗?

If so, should the last line above actually read like: 如果是这样,那么上面的最后一行实际上应该是这样的:

sb.appendCodePoint(c);

Is there a known test case - a sequence of UTF-8 bytes - that would distinguish between the two options? 是否有一个已知的测试用例(一系列UTF-8字节)可以区分这两种选择?

The Reader returns whatever it can make of the next piece of input, as a single character, as the Javadoc says. 正如Javadoc所说,Reader以单个字符的形式返回它对下一个输入所做的一切。 The only exception is the EOS indicator, which is -1 as an int. 唯一的例外是EOS指标,它作为int.是-1 int. There is no basis for your suggestion. 您的建议没有根据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM