简体   繁体   中英

C++ Polynomial Tokenizer

I am currently working on creating a tokenizer that takes in a polynomial as a string and outputs an array of monomials (individual terms) within the polynomial.

ex:

input: 4x^2+3x^-2+2

output: { "4x^2", "3x^-2", "2" }

I am not exactly sure where to start in regards to this due to the fact that polynomials are a little more tricky due to exceptions. Can anyone provide me any insight?

There may be some quick and dirty hacks that can be done using regular expressions or pattern matching, here.

However, the robust way of implementing this parsing is using standard tools that have been (or should've been) taught in our fine institutions of higher learning. Or, at least they were in my time. I am, of course, referring to lexical analyzers and LALR(1) parser generators .

A lexical analyzer, such as flex , takes a list of token definitions in form of regular expressions, and generates code that tokenizes the input stream. In this case, the following simple flex ruleset should be sufficient for tokenizing your polynomial, I think:

%{
#include "y.tab.h"
%}

digit         [0-9]
letter        [a-zA-Z]

%%
"+"                  { return PLUS;       }
"-"                  { return MINUS;      }
"*"                  { return TIMES;      }
"/"                  { return SLASH;      }
"^"                  { return EXPONENT;   }
{letter}+ {
                       yylval.id = strdup(yytext);
                       return IDENT;      }
{digit}+             { yylval.num = atoi(yytext);
                       return NUMBER;     }

This will do the initial task of parsing out the individual elements of the polynomial, from your input string.

The lexical analyzer works together with the LALR(1) parser generator, such as bison , which generates the y.tab.h file that defines the grammar to be parsed, and the elements in the grammar, like PLUS , MINUS and all the other tokens.

Bison takes a specification for a context-free grammar, and generates a parser for it. Grammar specifications, even for simple polynomials like that, tend to be fairly drawn out, so this would be just a subset of the grammar specification for your polynomials:

polynomial: additive_expression;

additive_expression: additive_term
                   | additive_expression plus_or_minus additive_term

plus_or_minus: PLUS | MINUS;

/* additive_term then fleshes out the structure of each polynomial term */

This would be supplemented, of course, with fragments of code that build a parse tree as part of the ruleset.

flex and bison have been around for a long time, originally generating C code (hence the C fragments in my flex example); but currently are capable of generating C++ code as well. It goes without saying that if you are not familiar with these tools, there will be a steep learning curve; but this is the time-tested way of implementing a parser for non-trivial syntax, such as your polynomials.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM