I have the following program which counts the number of occurrences of every uppercase letter from a text file. I am commenting every line to understand what is happening with the program when it's executed, however I got lost in the following line ++letter[c - 'A'];
. I cannot get it, I think it's suppose to collect the letter and store the cum sum of the occurrence of this letter. But the notation c - 'A', it's a bit alien for me.
void main(argc,argv)
int argc;
char *argv[];
{
int c, i, letter[26];
FILE *ifp, *ofp, *fopen(); // pointers to in file and output file pointer
if (argc != 3) /* if the number of arguments is not equal to 3 */
printf("\nusage: %s infile outfile\n\n", argv[0]); // prints the following
// message "usage letter infile outfile */
else { // if the number of arguments is three
ifp = fopen(argv[1], "r"); // opens the first file (pointer to file
// in reading mode //
ofp = fopen(argv[2], "w"); // then opens the the second argument "second
// file in writting mode. //
for (i = 0; i < 26; ++i) /* initialize array to zero */
letter[i] = 0; // makes every element in the array of 26 to 0, in other
// words the counter for every letter to 0
while (( c=getc(ifp)) != EOF) // gets the characters (next of a file)
if ('A' <= c && c <= 'Z') // if the character found is an uppercase
// character between A and Z
++letter[c - 'A']; //
for (i = 0; i < 26; ++i) {
if (i % 6 == 0)
fprintf(ofp, "\n");
fprintf(ofp, "%5c: %5d", 'A' + i, letter[i]);
}
fprintf(ofp, "\n\n");
}
}
It counts the number of occurrences of each letter.
'A'
is just a number. On an ASCII-based machine, 'A'
is just another way of writing 65
.
In ASCII , the 26 latin uppercase letters are found consecutively, so,
'A'-'A'
= 0
'B'-'A'
= 1
'C'-'A'
= 2
'Z'-'A'
= 25
(Note that this code doesn't work on EBCDIC machines because the letters aren't contiguous in that encoding.)
So letter[c - 'A']
produces a unique element of letter
for each letter.
Finally, ++x
increments the value of x
, so ++letter[c - 'A']
increments the element of letter
that corresponds to the letter in c
.
For example, if we read in ABRACADABRA
, we'll end up with
int letter[26] = {
/* A B C D E F G H I J K L M */
5, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
/* N O P Q R S T U V W X Y Z */
0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0
};
Your array
int letter[26];
is supposed to contain a counter for each capital letter.
In ASCII representation every character is represented by a 8 bit integer (base type char
). And capital letters are represented by the compact range [65-90]:
A
is represented by value 65 B
is represented by value 66 So, how to translate the input char to the index of your array of counters? Just by calculating c - 'A'
: its value will be 0 for A
, 1 for B
and so on.
In conclusion , ++letter[c - 'A']
just increments the counter corresponding to character c
.
In the program there is declared an array
int c, i, letter[26];
that is evidently used to count letters in an input file.
In this loop
while (( c=getc(ifp)) != EOF) // gets the characters (next of a file)
if ('A' <= c && c <= 'Z') // if the character found is an uppercase
// character between A and Z
++letter[c - 'A']; //
if the character c
contain a symbol between letters ['A', 'Z'] then to get the index in the array there is used the expression c - 'A'
. For example if c is equal to 'A' then the expression will yield 0. If c is equal to 'B' then the expression will yield 1/ And so on. SO the element of the array with the index 0 corresponds to the letter 'A'. The element of the index 1 corresponds to the letter 'B'. The value of the element is increased in this statement
++letter[c - 'A'];
So each element of the array accumulate the frequency of the corresponding letter encountered in the input file.
regarding:
if ('A' <= c && c <= 'Z')
++letter[c - 'A'];
would be much better written as:
if( isupper( c ) )
{
letter[ c-'A' ]++;
}
where isupper()
is exposed via the header file ctype.h
regarding:
for (i = 0; i < 26; ++i)
letter[i] = 0;
can be eliminated IF this:
int c, i, letter[26];
had been written as:
int c, i, letter[26] = {0};
Now for your question:
letter[]
is an array for all the capital (ASCII) alphabet where letter[0]
is a counter for the letter A
.... letter[25]
is a counter for the letter Z
.
This calculation:
[ c-'A' ]
is taking the ordinal value in the variable c
and subtracting the ordinal value of the capital letter A
. IE if c
contains A
then the result is 0 (which matches the index for the letter A
in the array letter[]
The ++
says to increment the value in the array letter[]
The overall result is the entry in the array letter[]
for the current letter in c
will be incremented, thereby keeping a count of the number of occurrences of each capital letter encountered in the input
c
is the character code that was read from the file. 'A'
is the character code of the letter A
. So c - 'A'
subtracts those two codes.
This takes advantage of the design that character codes for letters are sequential. So when c
is 'D'
, c - 'A'
will be 3
.
ASCII Value of 'A' is 65.
c
is the character code that was read from the file.
So when we write ++letter[c-'A']
here c
gives us current character which also have some ASCII value.
Suppose c is 'B' ASCII code is 66
Then we will increase count of index (66-65)
that is 1.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.