简体   繁体   中英

What is actually difference between InputStream and Reader in java

As I searched difference between InputStream and Reader . I got answered that.

InputStream: Byte-Base ( read byte by byte )

Reader: Character-Base ( read char by char )

I paste á character in file that's ASCII (or may be other Charset) is 225 in my OS and byte's max_value is 127. and I used FileInputStream to just read() then why it returning 225? how it is able to read more than one byte? because read() method just read one byte or character at a time.

Or what is the actually difference between InputStream and Reader ?

á does indeed have a unicode value of 225 (that's its code point, and is unrelated to its encoding). When you cast that down to a byte, you'll get -31. But if you take a careful look at the docs for InputStream.read , you'll see:

Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255.

(emphasis added) The read method returns an int , not a byte, but that int essentially represents an unsigned byte. If you cast that int down to a char, you'll get back to á . If you cast that int down to a byte, it'll wrap down to -31.

A bit more detail:

  • á has a unicode value of 225.
  • chars in Java are represented as UTF-16, which for 225 has a binary representation of 00000000 11100001
  • if you cast that down to a byte, it'll drop the high byte, leaving you with 11100001 . This has a value of -31 if treated as a signed byte, but 225 if treated as unsigned.
  • InputStream.read returns an int so that it can represent the stream's end as -1. But if the int is non-negative, then only its bottom 8 bits are set (decimal values 0-255)
  • When you cast that int down to a byte, Java will drop all but the lowest 8 bits -- leaving you again with 11100001
  • InputStream.read() returns an int. That is a value between 0 and 255 .

  • Byte.MAX_VALUE is 127 but Byte.MIN_VALUE is -128 which is binary 10000000 . But java does not support unsigned primitives so the most significant byte is always the sign bit.

The difference is that an InputStream will read the contents of the file as is, with no interpretation: the raw bytes.

A Reader on the other hand will use a CharsetDecoder to process the byte input and turn it into a sequence of char s instead. And the way it will process the byte input will depend on the Charset used.

And this is not a 1 <-> 1 relationship!

Also, forget about "ASCII values"; Java doesn't use ASCII, it uses Unicode, and a char is in fact a UTF-16 code unit. It was a full code point when Java began, but then Unicode defined code points outside the BMP and Java had to adapt: code points over U+FFFF are now represented using a surrogate pair, ie two chars.

See here for a more detailed explanation.

Not exactly on topic, but you've hit a Java limitation of having no unsigned native types.

In C/C++, byte can be 0..255 or -127..127.

In Java, the choice was made for a signed byte .

So, in order to represent an unsigned byte in java, we have to go to the next higher arithmetic type, int .

Same applies to unsigned int of course. To see the values over 2G in Java you have to have a long .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM