简体   繁体   中英

What is wrong with this Bison grammar?

Im trying to build a Bison grammar and seem to be missing something. I kept it yet very basic, still I am getting a syntax error and can't figure out why:

Here is my Bison Code:

%{

#include <stdlib.h>
#include <stdio.h>

int yylex(void);
int yyerror(char *s);

%}

// Define the types flex could return
%union {
    long lval;
    char *sval;
}

// Define the terminal symbol token types
%token <sval> IDENT;
%token <lval> NUM;

%%

Program: 
    Def ';' 
    ;

Def: 
    IDENT '=' Lambda { printf("Successfully parsed file"); }
    ;

Lambda: 
    "fun" IDENT "->" "end"
    ;

%%

main() {
    yyparse();
    return 0;
}

int yyerror(char *s)
{
  extern int yylineno;  // defined and maintained in flex.flex
  extern char *yytext;  // defined and maintained in flex.flex

  printf("ERROR: %s at symbol \"%s\" on line %i", s, yytext, yylineno); 
  exit(2);
}

Here is my Flex Code

%{
#include <stdlib.h>
#include "bison.tab.h"
%}

ID [A-Za-z][A-Za-z0-9]*
NUM [0-9][0-9]*
HEX [$][A-Fa-f0-9]+
COMM [/][/].*$

%%

fun|if|then|else|let|in|not|head|tail|and|end|isnum|islist|isfun    {
    printf("Scanning a keyword\n");
}


{ID}    {
    printf("Scanning an IDENT\n");
    yylval.sval =  strdup( yytext );
    return IDENT;
}

{NUM}   {
    printf("Scanning a NUM\n");
    /* Convert into long to loose leading zeros */
    char *ptr = NULL;
    long num = strtol(yytext, &ptr, 10);
    if( errno == ERANGE ) {
            printf("Number was to big");
            exit(1);
    }

    yylval.lval = num;
    return NUM;
}

{HEX}   {
    printf("Scanning a NUM\n");
    char *ptr = NULL;
    /* convert hex into decimal using offset 1 because of the $ */
    long num = strtol(&yytext[1], &ptr, 16);
    if( errno == ERANGE ) {
            printf("Number was to big");
            exit(1);
    }

    yylval.lval = num;
    return NUM;
}


";"|"="|"+"|"-"|"*"|"."|"<"|"="|"("|")"|"->" {
    printf("Scanning an operator\n");
}

[ \t\n]+ /* eat up whitespace */


{COMM}* /* eat up one-line comments */

.   {
    printf("Unrecognized character: %s at linenumber %d\n", yytext, yylineno );
    exit(1);
}

%%

And here is my Makefile :

all:    parser

parser: bison flex
    gcc bison.tab.c lex.yy.c -o parser -lfl

bison:  bison.y
    bison -d bison.y

flex:   flex.flex
    flex flex.flex

clean:
    rm bison.tab.h
    rm bison.tab.c
    rm lex.yy.c
    rm parser

Everything compiles just fine, I do not get any errors runnin make all.

Here is my testfile

f = fun x -> end;

And here is the output:

./parser < a0.0
Scanning an IDENT
Scanning an operator
Scanning a keyword
Scanning an IDENT
ERROR: syntax error at symbol "x" on line 1

since x seems to be recognized as a IDENT the rule should be correct, still I am gettin an syntax error.

I feel like I am missing something important, hopefully somebody can help me out.

Thanks in advance!

EDIT:

I tried to remove the IDENT in the Lambda rule and the testfile, now it seems to run through the line, but still throws

ERROR: syntax error at symbol "" on line 1

after the EOF.

Your scanner recognizes keywords (and prints out a debugging line, but see below), but it doesn't bother reporting anything to the parser. So they are effectively ignored.

In your bison definition file, you use (for example) "fun" as a terminal, but you do not provide the terminal with a name which could be used in the scanner. The scanner needs this name, because it has to return a token id to the parser.

To summarize, what you need is something like this:

In your grammar, before the %% :

token T_FUN "fun"
token T_IF "if"
token T_THEN "then"
 /* Etc. */

In your scanner definition:

fun { return T_FUN; }
if  { return T_IF; }
then { return T_THEN; }
 /* Etc. */

A couple of other notes:

  1. Your scanner rule for recognizing operators also fails to return anything, so operators will also be ignored. That's clearly not desirable. flex and bison allow an easier solution for single-character operators, which is to let the character be its own token id. That avoids having to create a token name. In the parser, a single-quoted character represents a token-id whose value is the character; that's quite different from a double-quoted string, which is an alias for the declared token name. So you could do this:

     "=" { return '='; } /* Etc. */ 

    but it's easier to do all the single-character tokens at once:

     [;+*.<=()-] { return yytext[0]; } 

    and even easier to use a default rule at the end:

     . { return yytext[0]; } 

    which will have the effect of handling unrecognized characters by returning an unknown token id to the parser, which will cause a syntax error.

    This won't work for "->", since that is not a single character token, which will have to be handled in the same way as keywords.

  2. Flex will produce debugging output automatically if you use the -d flag when you create the scanner. That's a lot easier than inserting your own debugging printout, because you can turn it off by simply removing the -d option. (You can use %option debug instead if you don't want to change the flex invocation in your makefile.) It's also better because it provides consistent information, including position information.

  3. Some minor points:

    • The pattern [0-9][0-9]* could more easily be written [0-9]+
    • The comment pattern "//".* does not require a $ lookahead at the end, since .* will always match the longest sequence of non-newline characters; consequently, the first unmatched character must either be a newline or the EOF. $ lookahead will not match if the pattern is terminated with an EOF, which will cause odd errors if the file ends with a comment without a newline at the end.
    • There is no point using {COMM}* since the comment pattern does not match the newline which terminates the comment, so it is impossible for there to be two consecutive comment matches. But anyway, after matching a comment and the following newline, flex will continue to match a following comment, so {COMM} is sufficient. (Personally, I wouldn't use the COMM abbreviation; it really adds nothing to readability, IMHO.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM