简体   繁体   中英

char * vs unsigned char* what is the difference?

In Linux with c, I didn't understant what is the diffrence between char* and unsigned char* When I reading/writing binary buffer?

When I must not using char* and need to use unsigned char* ?

First recall C has unsigned char , signed char and char : 3 distinct types. char has the same range as either unsigned char or signed char .

[Edit]

OP added "When I reading/writing binary buffer" so the far below sections (my original post) deals with "what is the difference between char* and unsigned char* " with a sample case without that r/w concern. Within this section....

Reading/writing binary via <stdio.h> can be done with any I/O function although it is more common to to use fread()/fwite() .

For byte orientated data, all I/O functions behave as if

The byte input functions read characters from the stream as if by successive calls to the fgetc function. C17dr § 7.21.3 11
The byte output functions write characters to the stream as if by successive calls to the fputc function. § 7.21.3 12

So let us look at those two.

... the fgetc function obtains that character as an unsigned char ... § 7.21.7.1 2
The fputc function writes the character specified by c (converted to an unsigned char ) § 7.21.7.3 2

Thus all I/O at the lowest level is best thought of as reading/writing unsigned char .

Now to directly address

When I must not using char* and need to use unsigned char* ? (OP)

With writing, pointers such as char* , unsigned char* or others can be used at OP level code, yet the underlying output function accesses data via unsigned char * . This has no impact on OP's execution of the write other than if char was encoded as ones' complement/sign magnitude - a trap code would not get detected.

Likewise with reading, the underlying input function saves data via unsigned char * and no traps occur. A single byte read via int fgetc() would report values in the unsigned char range even if char is signed .

The importance of using unsigned char* vs. char* in reading/writing binary buffer comes not so much in the I/O call itself (it all unsigned char * access), but in the setting up of data prior to writing and the interpretation of data after reading - see memcmp() below.



When I must not using char* and need to use unsigned char* ?

A good example is with string related code.

Although functions in <string.h> use char* in function parameters, the implementations performs as if char was unsigned char , even when char is signed .

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C17dr § 7.24.1 3

So even if char is a signed char , functions like int strcmp(char *a, char *b) perform as if int strcmp(unsigned char *a, unsigned char *b) .

  1. This makes a difference when string differ by a signed char c and char d with values of different signs.
    Eg Assume c < 0, d > 0

    // Accessed via char * and char is signed c < d is true // Accessed via unsigned char * c > d is false

This results in a different sign from the strcmp() return and so affects sorting strings.

// Incorrect code when `char` is signed.
int strcmp(const char *a, const char *b) {
  while (*a == *b && *a) { a++; b++; }
  return (*a > *b) - (*a < *b);
}

// Correct code when `char` is signed or unsigned, 2's complement or not
int strcmp(const char *a, const char *b) {
  const char *ua = a;
  const char *ub = b;
  while (*ua == *ub && *ua) { ua++; ub++; }
  return (*ua > *ub) - (*ua < *ub);
}

[Edit]

The like-wise applies to binary data read and compared with memcmp() .

  1. In old C implementations that did not use 2's complement, there could be 2 zeros: +0 and -0 (or trap).

+0 ended a string when properly view as a unsigned char . -0 is not a null character to terminate a string, even though as a signed char it has a value of zero.

// Incorrect code when `char` is signed and not 2's complement.
// Conversion to `unsigned char` done too late.
int strcmp(const char *a, const char *b) {
  while ((unsigned char)*a == (unsigned char)*b && (unsigned char)*a) { a++; b++; }
  return ((unsigned char)*a > (unsigned char)*b) - ((unsigned char)*a < (unsigned char)*b);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM