Lexer and parser in C++ from EBNF

Question

I need to write a lexer and a parser for a given grammar (I need to handcraft it not with generators). I have done a lot of research but I still can't figure out how to code it.

For example I have (grammar in EBNF):

<Letter> ::= [A-Za-z]

<IntegerLiteral> ::=<Digit> { <Digit> }

Does this need to be defined in the lexer or in the parser? And how?

I know that a lexer should read a file character by character and output tokens then these tokens are passed to the parser to create the parse tree however I am getting stuck in the coding.

Answer 1

What you show us looks like it defines token types. So it goes in the lexer.

The trick in writing a lexer is simply to take your input text (which is simply a long stream of individual characters) and look at them, one by one. Every time you look at a character, classify it according to the EBNF above (ie is it a Letter or an IntegerLiteral) then generate the appropriate token.

Now your grammar above sounds like a pretty pointless one (it generates single-character and single-digit tokens) So my guess is that you have more rules like this one that use these rules to make the definition more readable. So implement those more complex rules. Write a function for detecting whether a character matches one of the sub-rules.

Whenever you find that the current character doesn't match the previous character's type, finish the current one and start a new one.

That's pretty much all there is to it. You just need a bunch of booleans to keep track of the types.

Lexer and parser in C++ from EBNF

Question

1 answers

solution1
2 2014-04-17 10:30:42

Lexer and parser in C++ from EBNF

Question

1 answers

solution1 2 2014-04-17 10:30:42

solution1
2 2014-04-17 10:30:42