This code really confuses me, it is using some Stanford libraries for the Vector (array) class. Can anyone tell me what is the purpose of int index = line [j] - 'a';
why - 'a'?
void countLetters(string filename)
{
Vector<int> result;
ifstream in2;
in2.open(filename.c_str());
if (in.fail()) Error("Couldn't read '" + filename + "'");
for (int i = 0; i < ALPHABETH_SIZE; i++)
{
result.add(0); // Must initialize contents of array
}
string line;
while (true)
{
getLine(in, line);
// Check that we got a line
if (in.fail()) break;
line = ConvertToLowerCase(line);
for (int j = 0; j < line.length(); j++)
{
int index = line [j] - 'a';
if (index >= 0 && index < ALPHABETH_SIZE)
{
int prevTotal = result[index];
result[index] = prevTotal +1;
}
}
}
}
The purpose of the code:
Takes a filename and prints the number of times each letter of the alphabet appears in that file. Because there are 26 numbers to be printed, CountLetters needs to create a Vector. For example, if the file is:
"a" is at the beginning of ASII chars.
int index = line [j] - 'a'; if (index >= 0 && index < ALPHABETH_SIZE)
These two line of code is to just if line[j] is a character.
Characters in a string are encoded using a character set... typically ASCII on hardware common in English language systems. You can see the ASCII table at http://en.wikipedia.org/wiki/ASCII
In ASCII (and most other character sets), the numbers representing letters are contiguous. So, this is the natural way to test whether the character at index j
in character-array line
is a letter:
line[j] >= 'a' && line[j] <= 'z'
Your program is equivalent to that, in an algebra-kind of sense it subtracts a
from both sides (knowing that a
is the first character in the character set):
line[j] >= 'a' - `a` && line[j] <= 'z' - `a`
line[j] >= 0 && line[j] <= 'z' - `a`
Replacing "<= z
- a
" with am equivalent:
line[j] >= 0 && line[j] < ALPHABET_SIZE
where ALPHABET_SIZE is 26. This trades a dependency on knowing z
is the last character of your character set for knowing how many characters are in your character set - both are a little fragile, but fine if you know you're dealing with a well-known, stable character set encoding.
A better way to check for a letter is to use the isalpha()
predicate: http://www.cplusplus.com/reference/clibrary/cctype/isalpha/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.