简体   繁体   中英

Inheritance design for tokenizer written in C++

I'm writing a simple parser in C++ designed to parse a subset of an s-expression language.

I am trying to design my inheritance hierarchy for the tokenizer in a clean manner, but I'm running into issues with object slicing because I've been trying to avoid dynamic allocation. The reason I'd like to avoid dynamic allocation is to sidestep the issue of introducing memory leaks in my tokenizer and parser.

The overall structure is: a Parser has a Tokenizer instance. The parser calls Tokenizer::peek() which returns the token at the head of the input. I want peek() to return a Token instance by value, instead of dynamically allocating a Token of the correct derived class and returning the pointer.

More concretely, suppose there are two token types: Int and Float. Here is an example which will hopefully clarify the issue:

class Token {
public:
  virtual std::string str() { return "default"; }
};

template <typename T>
class BaseToken : public Token {
public:
  T value;
  BaseToken(const T &t) : value(t) {}
  virtual std::string str() {
    return to_str(value);
  }
};

class TokenInt : public BaseToken<int> {
public:
  TokenInt(int i) : BaseToken(i) {}
};

class TokenFloat : public BaseToken<float> {
  TokenFloat(float f) : BaseToken(f) {}
};

Token peek() {
  return TokenInt(10);
}

int main() {
  Token t = peek();
  std::cout << "Token is: " << t.str() << "\n";
  return 0;
}

As is probably obvious, the output is "Token is: default" instead of "Token is: 10" because of the TokenInt being sliced down to a Token.

My question is: is there a proper inheritance structure or design pattern to accomplish this type of polymorphism without using dynamic allocation?

So expanding on my comment, you can use a boost::variant. The docs have a v. good tutorial on it ( http://www.boost.org/doc/libs/1_57_0/doc/html/variant.html ), but here's an example of how to use it in your situation (note - I've added some functionality to show how to use the extremely handy static_visitor)

Boost::variant is also header-only, so no special care is needed in linking.

(Note - you could just use a boost::variant directly as your Token type; however, if you encapsulate it in a class, you can hide the use of visitors inside class methods)

#include <string>
#include <sstream>
#include <boost/variant.hpp>

typedef boost::variant<std::string, int, float> TokenData;


// Define a function overloaded on the different variant contained types:
std::string type_string(int i)
{
   return "Integer";
}

std::string type_string(std::string const& s)
{
   return "String";
}

std::string type_string(float f)
{
   return "Float";
}

// Visitors implement type specific behavior. See the boost::variant docs
// for some more interesting visitors (recursive, multiple dispatch, etc)

class TypeVisitor : public boost::static_visitor<std::string>  {
public:
   template <typename T>
   std::string operator()(T const& val) const
   {
      return type_string(val);
   }
};

// Token class - no inheritance, so no possible slicing!

class Token {
public:
   template <typename T>
   Token(const T& value):
      m_value(value)
   {}

   std::string str() const {
      // Variants by default have their stream operators defined to act
      // on the contained type. You might want to just define operator<< 
      // for the Token class (see below), but I'm copying your method  
      // signature here.

      std::stringstream sstr;
      sstr << m_value;
      return sstr.str();
   }

   std::string token_type() const {
      // Note: you can actually just use m_value.type() to get the type_info for
      // the variant's type and do a lookup based on that; however, this shows how 
      // to use a static_visitor to do different things based on type
      return boost::apply_visitor(TypeVisitor(), m_value);
   }

private:
   TokenData m_value;

friend
    std::ostream& operator<<(std::ostream&, Token const&);

};

// An alternative to the "str" method
std::ostream& operator<<(std::ostream& oo, Token const& tok)
{
    return oo << tok.m_value;
}


int main(){
   Token t1(10), t2("Hello"), t3(1.5f);
   std::cout << "Token 1 is: " << t1.str() << " Type: "<< t1.token_type() << "\n";
   std::cout << "Token 2 is: " << t2.str() << " Type: "<< t2.token_type() << "\n";
   // Use Token::operator<< instead:
   std::cout << "Token 3 is: " << t3 << " Type: "<< t3.token_type() << "\n";

}

Output:

Token 1 is: 10 Type: Integer
Token 2 is: Hello Type: String
Token 3 is: 1.3 Type: Float

In order to return a value you have to know what size it is. The only way to do that is to:

  1. return a smart pointer to a base Token type as was suggested in the comments,
  2. return a union of all your token types that you then cast based on a type tag, or
  3. return a generic Token type that contains a type tag and the matched text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM