简体   繁体   中英

Raw data (bytes) and signed/unsigned variables

I've been told that whenever you work with bytes, you should declare your variables as unsigned chars. In Windows' data types, BYTE is declared as an unsigned char.

My questions:

Why?

Unsigned is a representation of integers from 0 to 255 and signed 128 to -127.

If that's the case, then how is EOF in binaries (-1) caught?

EOF is declared in stdio.h as a -1 #define macro.

When you read chars from a stream, the return type of functions like std::getc is int , and not char . The constant EOF is of type int , and not char or unsigned char .

Even in the C++ I/O API, the I/O streams like std::ifstream deal with types char_type (that is the type of characters in the stream), and int_type that is a type that can hold all values of char_type , plus EOF .

EOF is a status information, and is distinct from the data. However some functions have habit of using single return type for both. Example:

/* Return next data byte (0 - 255) or EOF (-1) if there was an error */
int readByte(...);

Point is that you need to have larger type than plain byte to be able to do this.

You use EOF with functions like getchar() which return int. So it is possible to map legal values of int(0 to 255) to char(0 to 255) and still differentiating EOF int(-1)

Quoting from cplusplus.com about return value of getchar.

On success, the character read is returned (promoted to an int value). The return type is int to accommodate for the special value EOF, which indicates failure: If the standard input was at the end-of-file, the function returns EOF and sets the eof indicator (feof) of stdin. If some other reading error happens, the function also returns EOF, but sets its error indicator (ferror) instead.

fgetc (one of the basic function that can returns EOF) is declared to return an integer.

When you look into the CRT functions, than you can see that all functions that can return EOF have an int signature.

So in fact you have always a truncation to char, when using fgetc and store the result in a char.

There is no great difference between unsigned char, and char. The real difference is when the compiler converts it to an int. In one case you have a sign extension in the other case (unsigned) you don't have it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM