简体   繁体   中英

C program confusing from Ritchie and Kernighan

I have read everywhere that C is a must know language before you graduate college if you are in computer science. I have picked up the book Ritchie and Kernighan 2nd edition recommended by my professor. I am quite confused by how this program is working. If anyone could explain it that would be great! The program counts the occurrences of each digit, white spaces and other characters. I wrote comments for things I understand already and questions are added in there where I don't understand how it worked.

#include <stdio.h>

int main()
{
     int c, i, nwhite, mother;   //Declaring variables
     int ndigit[10];             //Declaring an array that holds indexes from 0-9

     nwhite = nother = 0;        //Initializes two variables to 0

     for (i = 0; i < 10; ++i)    //Does this for loop keep going through and for each index
             ndigit[i] = 0;      //0-9 set the value of it to 0?

     white ((c = getchar()) !=EOF)  //While loop to check if char at EOF
     {
          if (c>='0' && c <= '9')        //Completely lost on how this works?
               ++ndigit[c - '0'];
          else if (c == ' ' || c == '\n' || c == '\t')     //Checks for white space 
               ++nwhite;
          else                                   //If nothing else increment char variable
               ++nother;

     printf("digits ");               //First for loop traverses and prints each value of
     for (i=0; i<10; ++i)             //the index. The next printf prints white space and
          printf(" %d", ndigit[i]);   //other characters
      printf(", white space = %d, other = other = %d\n", nwhite, nother);
}

Yes, this loops over the entire ndigit array and sets each element to 0.

for (i = 0; i < 10; ++i)   
    ndigit[i] = 0; 

here:

if (c>='0' && c <= '9')
    ++ndigit[c - '0'];

the condition (on the if -line) might be the most difficult thing to understand:

image it like this:

if( A and B)
    ...

with:

A = c>'0'

and

B = c<= '9'

that is, if c is between 0 and 9 (including).

hope this helps.

(c >= '0' && c <= '9')        //Completely lost on how this works?

The char type in C is just an 8-bit integer, usually signed. The correspondence between characters such as 1 and 9 and integers like 63 is defined by the ASCII code . The code snipped above evaluates to 1 if c is a digit. Writing '1' in C is equivalent to writing 49 .

The array of integers declared is of type auto, and in C auto storage class integer array will have garbage values. Hence these are assigned with Zero.

for (i = 0; i < 10; ++i)   
   ndigit[i] = 0; 

For the if statement you are checking whether the given character is between 0-9. Since its getchar()- (get characters), values are compared with single quote

if (c>='0' && c <= '9') 

You can also change this statement to

if (c>=48 && c <= 57)

Since ASCII character for '0' is 48 and '9' is 57

++ndigit[c - '0'];

You are keeping a count of numbers entered in the array, ie lets say if input is 5 then your increasing the count of the array at index 5 by 1.

Here c-'0' is mentioned because, As mentioned above ASCII character for '5' is 53. So, the above statement results to 53 - 48 = 5 (index 5 of the array)

 int ndigit[10];             //Declaring an array that holds indexes from 0-9

 for (i = 0; i < 10; ++i)    //Does this for loop keep going through and for each index

In C arrays are indexed starting from 0, so in this case the valid indices of ndigit are 0 through 9, inclusive. The for loop starts from i = 0 and increments i by one ( ++i ) as long as i < 10 , so the loop body gets executed for values of i from 0 to 9, inclusive.

In this program it is left up to the reader to figure out that the two 10 literals are the same thing; arguably it would in better style to #define a constant and use it in both places (but in this case it might be forgiven since the number of decimal digits is constant at 10).

 while ((c = getchar()) !=EOF)  //While loop to check if char at EOF

getchar returns one character read from the standard input, or EOF (which has a value distinct from any char ) if the standard input is at end of file. There is no EOF character that would signal this condition.

      if (c>='0' && c <= '9')        //Completely lost on how this works?

The values of the characters for decimal digits 0 through 9 are contiguous in ASCII and most other character sets, so this code takes advantage of that and checks if c is equal to or greater than '0' and ( && ) also equal to or less than '9' . In other words, this checks if c is in the range between '0' and '9' , inclusive. '0' and '9' are character literals that have integer values of the corresponding characters in the character set – this way the programmer does not have to know and write their values if (c>=48 && c<=57) , and the code works on incompatible character sets as long as the digit characters have contiguous values.

           ++ndigit[c - '0'];

This counts the number of times each digit occurs. Again since the characters representing the digits are contiguous, subtracting the value of the first digit ( '0' ) from the character that was read results in values 0 through 9 for the corresponding characters, which are also the valid indices of the array ndigit as discussed above. Of course this subtraction only works on digit characters, which is why the preceding if checks for that.

For example, in ASCII '0' has the value 48, '1' is 49, etc. So if c was '1' , then the result of the subtraction c - '0' is '1'-'0' , ie, 49-48, or 1.

The ++ operator increments the value in the corresponding index of the ndigit array. (The first for loop set the initial values of in ndigit to zero.)

EOF - an abbreviation for End-of-file - is a macro defined in the stdio.h header file. A macro is a fragment of code which has been given a name. Wherever the name is used, it is replaced by the contents of the macro. It is a value which represents a condition when no more characters can be read from a source, which in this case, is the stdin - the standard input stream, usually a keyboard. In short, for any character c , the condition ch == EOF is always false. So the following condition will keep reading from the stdin until you signal EOF by pressing Ctrl+D on *nix systems and Ctrl+Z on Windows systems:

white((c = getchar()) != EOF) {
}

Next, this condition checks whether the character read c , is a numeric character, ie, a decimal digit.

if(c>='0' && c <= '9') {  // check if c is a numeric character.
} 

The characters in C are actually integer values (think of them as 1-byte integers) and are equal to ASCII codes of the corresponding character (though this is not mandated by the C standard). In ASCII, the code values for characters 0 through 9 are in increasing order. So, if the above condition is satisfied, then this means c is a numeric character.

++ndigit[c - '0'];

In the above statement, c - '0' is the numeric character converted to integer value. So, if the value of c was '8' , then c - '0' evaluates to 8 . Next, it fetches the value in the array ndigit at index 8 , which keeps count of the digit 8 , and increments it by 1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM