简体   繁体   中英

Arithmetic on C++ strings

This code really confuses me, it is using some Stanford libraries for the Vector (array) class. Can anyone tell me what is the purpose of int index = line [j] - 'a'; why - 'a'?

void countLetters(string filename)
{
Vector<int> result;

ifstream in2;
in2.open(filename.c_str());
if (in.fail()) Error("Couldn't read '" + filename + "'");

for (int i = 0; i < ALPHABETH_SIZE; i++)
{
    result.add(0);  // Must initialize contents of array
}

string line;
while (true)
{
    getLine(in, line);
    // Check that we got a line
    if (in.fail()) break;

    line = ConvertToLowerCase(line);
    for (int j = 0; j < line.length(); j++)
    {
        int index = line [j] - 'a';
        if (index >= 0 && index < ALPHABETH_SIZE)
        {
            int prevTotal = result[index];
            result[index] = prevTotal +1;
        }
    }
}
}

The purpose of the code:

Takes a filename and prints the number of times each letter of the alphabet appears in that file. Because there are 26 numbers to be printed, CountLetters needs to create a Vector. For example, if the file is:

"a" is at the beginning of ASII chars.

int index = line [j] - 'a'; if (index >= 0 && index < ALPHABETH_SIZE)

These two line of code is to just if line[j] is a character.

Characters in a string are encoded using a character set... typically ASCII on hardware common in English language systems. You can see the ASCII table at http://en.wikipedia.org/wiki/ASCII

In ASCII (and most other character sets), the numbers representing letters are contiguous. So, this is the natural way to test whether the character at index j in character-array line is a letter:

line[j] >= 'a' && line[j] <= 'z'

Your program is equivalent to that, in an algebra-kind of sense it subtracts a from both sides (knowing that a is the first character in the character set):

line[j] >= 'a' - `a` && line[j] <= 'z' - `a`

line[j] >= 0 && line[j] <= 'z' - `a`

Replacing "<= z - a " with am equivalent:

line[j] >= 0 && line[j] < ALPHABET_SIZE

where ALPHABET_SIZE is 26. This trades a dependency on knowing z is the last character of your character set for knowing how many characters are in your character set - both are a little fragile, but fine if you know you're dealing with a well-known, stable character set encoding.

A better way to check for a letter is to use the isalpha() predicate: http://www.cplusplus.com/reference/clibrary/cctype/isalpha/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM