简体   繁体   中英

What does ++letter[c - 'A']; mean in this C program?

I have the following program which counts the number of occurrences of every uppercase letter from a text file. I am commenting every line to understand what is happening with the program when it's executed, however I got lost in the following line ++letter[c - 'A']; . I cannot get it, I think it's suppose to collect the letter and store the cum sum of the occurrence of this letter. But the notation c - 'A', it's a bit alien for me.

void main(argc,argv)
   int argc;
   char *argv[];
   {
   int c, i, letter[26];
   FILE    *ifp, *ofp, *fopen(); // pointers to in file and output file pointer

  if (argc != 3) /* if the number of arguments is not equal to 3 */
      printf("\nusage: %s infile outfile\n\n", argv[0]); // prints the following
      // message "usage letter infile outfile */
  else { // if the number of arguments is three 
      ifp = fopen(argv[1], "r"); // opens the first file (pointer to file 
      // in reading mode //
      ofp = fopen(argv[2], "w"); // then opens the the second argument "second
      // file in writting mode. // 
      for (i = 0; i < 26; ++i) /* initialize array to zero */
          letter[i] = 0; // makes every element in the array of 26 to 0, in other
          // words the counter for every letter to 0
      while (( c=getc(ifp)) != EOF) // gets the characters (next of a file)
          if ('A' <= c && c <= 'Z') // if the character found is an uppercase 
          // character between A and Z
              ++letter[c - 'A']; // 
      for (i = 0; i < 26; ++i) {
          if (i % 6 == 0)
              fprintf(ofp, "\n");
          fprintf(ofp, "%5c: %5d", 'A' + i, letter[i]);
          }
          fprintf(ofp, "\n\n");
      }
  }

It counts the number of occurrences of each letter.


'A' is just a number. On an ASCII-based machine, 'A' is just another way of writing 65 .

In ASCII , the 26 latin uppercase letters are found consecutively, so,

  • 'A'-'A' = 0
  • 'B'-'A' = 1
  • 'C'-'A' = 2
  • ...
  • 'Z'-'A' = 25

(Note that this code doesn't work on EBCDIC machines because the letters aren't contiguous in that encoding.)

So letter[c - 'A'] produces a unique element of letter for each letter.

Finally, ++x increments the value of x , so ++letter[c - 'A'] increments the element of letter that corresponds to the letter in c .

For example, if we read in ABRACADABRA , we'll end up with

int letter[26] = {
    /* A  B  C  D  E  F  G  H  I  J  K  L  M */
       5, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    /* N  O  P  Q  R  S  T  U  V  W  X  Y  Z */
       0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0
};

Your array

int letter[26];

is supposed to contain a counter for each capital letter.

In ASCII representation every character is represented by a 8 bit integer (base type char ). And capital letters are represented by the compact range [65-90]:

  • A is represented by value 65
  • B is represented by value 66
  • ... and so on

So, how to translate the input char to the index of your array of counters? Just by calculating c - 'A' : its value will be 0 for A , 1 for B and so on.

In conclusion , ++letter[c - 'A'] just increments the counter corresponding to character c .

In the program there is declared an array

int c, i, letter[26];

that is evidently used to count letters in an input file.

In this loop

  while (( c=getc(ifp)) != EOF) // gets the characters (next of a file)
      if ('A' <= c && c <= 'Z') // if the character found is an uppercase 
      // character between A and Z
          ++letter[c - 'A']; // 

if the character c contain a symbol between letters ['A', 'Z'] then to get the index in the array there is used the expression c - 'A' . For example if c is equal to 'A' then the expression will yield 0. If c is equal to 'B' then the expression will yield 1/ And so on. SO the element of the array with the index 0 corresponds to the letter 'A'. The element of the index 1 corresponds to the letter 'B'. The value of the element is increased in this statement

++letter[c - 'A'];

So each element of the array accumulate the frequency of the corresponding letter encountered in the input file.

regarding:

if ('A' <= c && c <= 'Z') 
     ++letter[c - 'A'];

would be much better written as:

if( isupper( c ) )
{
    letter[ c-'A' ]++;
}

where isupper() is exposed via the header file ctype.h

regarding:

for (i = 0; i < 26; ++i) 
   letter[i] = 0;

can be eliminated IF this:

int c, i, letter[26];

had been written as:

int c, i, letter[26] = {0};

Now for your question:

letter[] is an array for all the capital (ASCII) alphabet where letter[0] is a counter for the letter A .... letter[25] is a counter for the letter Z .

This calculation:

[ c-'A' ]

is taking the ordinal value in the variable c and subtracting the ordinal value of the capital letter A . IE if c contains A then the result is 0 (which matches the index for the letter A in the array letter[]

The ++ says to increment the value in the array letter[]

The overall result is the entry in the array letter[] for the current letter in c will be incremented, thereby keeping a count of the number of occurrences of each capital letter encountered in the input

c is the character code that was read from the file. 'A' is the character code of the letter A . So c - 'A' subtracts those two codes.

This takes advantage of the design that character codes for letters are sequential. So when c is 'D' , c - 'A' will be 3 .

ASCII Value of 'A' is 65.

c is the character code that was read from the file.

So when we write ++letter[c-'A'] here c gives us current character which also have some ASCII value.

Suppose c is 'B' ASCII code is 66

Then we will increase count of index (66-65) that is 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM