As I searched difference between InputStream
and Reader
. I got answered that.
InputStream: Byte-Base ( read byte by byte )
Reader: Character-Base ( read char by char )
I paste á
character in file that's ASCII (or may be other Charset) is 225 in my OS and byte's max_value
is 127. and I used FileInputStream
to just read() then why it returning 225? how it is able to read more than one byte? because read()
method just read one byte or character at a time.
Or what is the actually difference between InputStream
and Reader
?
á
does indeed have a unicode value of 225 (that's its code point, and is unrelated to its encoding). When you cast that down to a byte, you'll get -31. But if you take a careful look at the docs for InputStream.read , you'll see:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255.
(emphasis added) The read
method returns an int , not a byte, but that int essentially represents an unsigned byte. If you cast that int down to a char, you'll get back to á
. If you cast that int down to a byte, it'll wrap down to -31.
A bit more detail:
á
has a unicode value of 225. 00000000 11100001
11100001
. This has a value of -31 if treated as a signed byte, but 225 if treated as unsigned. InputStream.read
returns an int so that it can represent the stream's end as -1. But if the int is non-negative, then only its bottom 8 bits are set (decimal values 0-255) 11100001
InputStream.read()
returns an int. That is a value between 0
and 255
.
Byte.MAX_VALUE
is 127
but Byte.MIN_VALUE
is -128
which is binary 10000000
. But java does not support unsigned primitives so the most significant byte is always the sign bit.
The difference is that an InputStream
will read the contents of the file as is, with no interpretation: the raw bytes.
A Reader
on the other hand will use a CharsetDecoder
to process the byte input and turn it into a sequence of char
s instead. And the way it will process the byte input will depend on the Charset
used.
And this is not a 1 <-> 1 relationship!
Also, forget about "ASCII values"; Java doesn't use ASCII, it uses Unicode, and a char
is in fact a UTF-16 code unit. It was a full code point when Java began, but then Unicode defined code points outside the BMP and Java had to adapt: code points over U+FFFF are now represented using a surrogate pair, ie two chars.
See here for a more detailed explanation.
Not exactly on topic, but you've hit a Java limitation of having no unsigned native types.
In C/C++, byte can be 0..255 or -127..127.
In Java, the choice was made for a signed byte
.
So, in order to represent an unsigned byte
in java, we have to go to the next higher arithmetic type, int
.
Same applies to unsigned int
of course. To see the values over 2G in Java you have to have a long
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.