简体   繁体   中英

C++ Tokenizing mathematical expression using classes

I'm attempting to re-learn bits about C++ inheritance as well as write a program that evaluates simple mathematical expressions (as strings) from scratch for practice, but I'm running into a lot of problems. My only prior experience with lexing like this is with OCaml (and Ocamllex/yacc) and it's quite a bit different there.

Anyway, I created a simple Token class:

class Token{
    public:
        string type;
        string characters;
};

Where type is a string that tells the type of token (plus, minus, etc) and characters is the string that the token actually consists of. From this, I made the derived classes for each token. Here is the one for +:

class Plus: public Token{
    public:
        Plus(){
            type = "plus";
            characters = "+";
        }
};

Minus , Times , DividedBy , LeftParenthesis , RightParenthesis , and Number are all done the same way. However, for Number , I wanted to add in a double that stores the value of the token, so it also includes the public variable double value and its constructor sets this value. You can probably see where issues would come up here already, and I'll get to those in a bit.

I have a function that reads in a string, splits it up into these substrings, and stores it as a string vector (So "3.2 +( -1.83)" will return a vector with the strings "3.2" , "+" , "(" , "-1.83" , and ")" ). I have another one that turns it all into Token objects and stores it as a Token vector (so it does stuff like tokens.push_back(LeftParenthesis()); ).

Now I want to take this Token vector and do stuff with it. (The goal is to use the shunting yard algorithm to put it into reverse Polish notation and then evaluate it.) However, I'm having a lot of issues, mainly with the fact that I am using a Token vector, and every element of it is a derived type of Token.

The biggest issue is that I can't access the value element of the Number type Tokens. I've tried other things such as using virtual getter functions, and it still doesn't work right. Another issue is that the only way I can tell what type of Token each thing is is by reading the characters element, which seems to make the whole subclass thing pointless.

Is it possible to continue down the method I started, or should I just abandon the whole Token class and its subclasses and just work with the strings? It wouldn't be the worst, since I've been learning stuff about inheritance with all the errors I've been getting.

Edit: In response to Thomas Matthews: I was playing around with virtual methods (which didn't work either), so here is the actual Token class and Number as well:

class Token{
public:
    string type;
    string characters;
    virtual double getValue();
};

class Number: public Token{
public:
    double value;
    Number(string chars){
        type = "number";
        characters = chars;
        value = atof(chars.c_str());
    }
    virtual double getValue(){
        return value;
    }
};

And here's the main method I'm playing with, which just takes a random string I made and attempts to print it back out, but by accessing the value element of any number when doing so:

int main(){
vector<Token> tvec = tokenize("5 + 4 - 11 * (-3.2 + .19)");
for(vector<Token>::iterator it = tvec.begin(), end = tvec.end(); it != end; ++it)
if(it->type == "number")
    cout << it->getValue();  // doesn't work
else
    cout << it->characters;
cout << endl;
return 0;
}

The problem is that you're using a vector of Tokens ( vector<Token> ) so any time you try stick one of your derived classes into the vector, it creates a new Token object with values copied from the base part of your token class and the rest stripped away. This is referred to as object slicing and is a subtle problem with C++ that programmers who have experience with more dynamic OO languages run into.

In C++, inheritance and object-orientation in general only work via pointers and references. If you want to have a generic vector that can hold any subclass of Token , you need to use a vector<Token *> , with all the related memory-management issues.

I suggest you add a method that returns the Token's value as a string.

This will allow you to convert the value into a number.

Perhaps what you should be thinking about is:

  • What is the value of a Token used for?
  • Do you need to tell the token to evaluate an given expression?

One issue is that you may not want to process everything as a generic Token or there is limited processing you can perform on a generic token.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM