简体   繁体   中英

Bison nonterminal useless in grammar, rule useless in parser

I am trying to make a compiler "from scratch" using flex-bison. I am tried to find help online but there is not too much that I have dug out I managed to find a book: flex & bison by John Levine

It was pretty useful but I am stuck without knowing what to do.

This is my flex code:

%option noyywrap 

%{
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include "parser.tab.h"

    extern FILE *yyin;
    extern FILE *yyout;
    int line_no = 1;

    //the function of lexer analysis. Return the token
    int yylex();
    //error function 
    void yyerror();
    //print statement function
    void print_return(char *token);

%}

%x ML_COMMENT

alphabet    [a-zA-Z]        
digit           [0-9]
alphanumeric    {alphabet}|{digit}
print       [ -~]
underscore  _
identifier      ({alphabet}|{underscore})+({alphanumeric}|{underscore})* 
integer         "0"|[0-9]{digit}*
float_number    "0"|{digit}*"."{digit}+
char        \'{print}\'

%%

"PROGRAM"       {  print_return("PROGRAM"); return PROGRAM}

"%".*           {  print_return("COMMENT"); return COMMENT; }

"BREAK"         {  print_return("BREAK"); return BREAK; }
"VARS"          {  print_return("VARS"); return VARS; }    

"STARTMAIN"     {  print_return("STARTMAIN");  return STARTMAIN; }   
"ENDMAIN"       {  print_return("ENDMAIN"); return ENDMAIN;}

"IF"            {  print_return("IF");  return IF; }
"THEN"          {  print_return("THEN"); return THEN;}
"ELSEIF"        {  print_return("ELSEIF"); return ELSEIF; }
"ELSE"          {  print_return("ELSE"); return ELSE; }
"ENDIF"         {  print_return("ENDIF"); return ENDIF; }

"FOR"           {  print_return("FOR"); return FOR; }
"TO"            {  print_return("TO"); return TO; }
"STEP"          {  print_return("STEP"); return STEP; }
"ENDFOR"        {  print_return("ENDFOR"); return ENDFOR; }

"SWITCH"        {  print_return("SWITCH"); return SWITCH; }
"CASE"          {  print_return("CASE"); return CASE; }
"ENDSWITCH"     {  print_return("ENDSWITCH"); return ENDSWITCH; }

"RETURN"        {  print_return("RETURN"); RETURN; }

"FUNCTION"      {  print_return("FUN"); return FUN; }
"ENDFUNCTION"   {  print_return("ENDFUNCTION"); return ENDFUNCTION; }

"PRINT"         {  print_return("PRINT"); return PRINT; }

"WHILE"         {  print_return("WHILE"); return WHILE;}
"ENDWHILE"      {  print_return("ENDWHILE"); return ENDWHILE;}

";"             {  print_return("QM"); return QM; }
"\n"            {  line_no++; print_return("NEWLINE"); return NEWLINE; }     
"\t"            {  print_return("INDENT"); return INDENT; }     
       
"+="            {  print_return("ADD_ASSIGN"); return ADD_ASSIGN; }        
"-="            {  print_return("SUB_ASSIGN"); return SUB_ASSIGN; }       
"/="            {  print_return("DIV_ASSIGN"); return DIV_ASSIGN; }       
"%="            {  print_return("MOD_ASSIGN"); return MOD_ASSIGN; }     
"--"            {  print_return("DEC_OP"); return DEC_OP; }      
"++"            {  print_return("INC_OP"); return INC_OP; }        
"AND"           {  print_return("AND_OP"); return AND_OP; }      
"OR"            {  print_return("OR_OP"); return OR_OP; }       
"=="            {  print_return("EQ_OP"); return EQ_OP; }      
">="            {  print_return("GE_OP"); return GE_OP; }      
"<="            {  print_return("LE_OP"); return LE_OP; }      
"!="            {  print_return("NE_OP"); return NE_OP; }     
"{"             {  print_return("L_BRACE"); return L_BRACE; }        
"}"             {  print_return("R_BRACE"); return R_BRACE; }        
","             {  print_return("COMMA"); return COMMA; }       

"="             {  print_return("ASSIGN"); return ASSIGN; }       
"("             {  print_return("L_PAR"); return L_PAR; }        
")"             {  print_return("R_PAR"); return R_PAR;}     
"["             {  print_return("L_BRACK"); return L_BRACK; }        
"]"             {  print_return("R_BRACK"); return R_BRACK;}     
"."             {  print_return("DOT"); return DOT; }        
"_"             {  print_return("UNDERSCORE"); return UNDERSCORE; }       
"-"             {  print_return("MINUS"); return MINUS; }        
"+"             {  print_return("PLUS"); return PLUS; }       
"*"             {  print_return("MUL"); return MUL; } 
":"             {  print_return("COLON"); return COLON; }

"/"             {  print_return("DIV"); return DIV; }
"<"             {  print_return("LT"); return LT; }
">"             {  print_return("GT"); return GT; }
[ ]             ; 
.               { yyerror("Unkown character"); }

{identifier}    { print_return("ID"); strcpy(yylval.name, yytext);  return IDENTIFIER; } 
{integer}       { yylval.integer_val = atoi(yytext); print_return("INTEGER"); return INTEGER; }
{float_number}  { print_return("FLOAT"); return FLOAT; }
{char}          { print_return("CHAR"); return CHAR; }

%%
/*---------------------------------------------------------------------------------------------------------------------*/
void print_return(char *token)
{
    printf("Token: %s\t\t Line: %d\t\t Text: %s\n", token, line_no, yytext);    
}

This is my bison file:

%{

    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    #include "print_console.c"

    //pointer to input file of lexer
    extern FILE *yyin; 
    //pointer to output file of lexer
    extern FILE *yyout;
    //line counter
    extern int line_no;
    //reads the input stream generates tokens
    extern int yylex();
    //temporary token save
    extern char* yytext;

    //Function Initilize
    int yylex();
    void yyerror(char *message);

%}

//struct for print_console
%union 
{
    char name[500];
    int integer_val;
}

/* --------------------------------------- TOKENS ---------------------------------------*/
//starting symbol
%start PROGRAM 

%token COMMENT
%token BREAK
%token VARS

%token QM

%token STARTMAIN 
%token ENDMAIN

%token IF
%token THEN
%token ELSEIF
%token ELSE
%token ENDIF

%token FOR
%token TO
%token STEP
%token ENDFOR

%token SWITCH
%token CASE
%token ENDSWITCH

%token RETURN

%token FUNCTION
%token ENDFUNCTION

%token PRINT

%token WHILE
%token ENDWHILE

%token NEWLINE
%token INDENT

%token ADD_ASSIGN
%token SUB_ASSIGN
%token DIV_ASSIGN
%token MOD_ASSIGN
%token DEC_OP
%token INC_OP
%token AND_OP
%token OR_OP
%token EQ_OP
%token GE_OP
%token LE_OP
%token NE_OP
%token L_BRACE
%token R_BRACE
%token COMMA
%token COLON

%token ASSIGN
%token L_PAR
%token R_PAR
%token L_BRACK
%token R_BRACK
%token DOT
%token UNDERSCORE
%token MINUS
%token PLUS
%token MUL
%token DIV
%token LT
%token GT
%token FLOAT
%token CHAR

%token <name> IDENTIFIER
%token <integer_val> INTEGER

//type for access to $$ 
%type <integer_val> line int_op int_data
%type <name> calc_assignment

%%  

/* --------------------------------------- BNF GRAMMAR ---------------------------------------*/

program: program line;

line:   if_stmt {;} 
        | elseif_stmt {;} 
        | else_stmt {;} 
        | for_statement {;} 
        | function NEWLINE INDENT {;}
        | function NEWLINE indent2 {;}
        | function NEWLINE {;} 
        | function_call {;} 
        | comments NEWLINE {;} 
        | action {;}
        | print NEWLINE {;}
        | switch  NEWLINE  case NEWLINE {;}
        | dictionaries NEWLINE {;} 
    | calc_assignment NEWLINE {;}
        | NEWLINE {;}    ;
/*--------- BREAK -------------*/
break:BREAK QM NEWLINE ;

/*--------- ACTION & indents -------------*/

indent2: INDENT INDENT;
indent3: INDENT INDENT INDENT;
indent4: INDENT INDENT INDENT INDENT;
indent5: INDENT INDENT INDENT INDENT INDENT;

action: INDENT line 
        | indent2 line 
        | indent3 line 
        | indent4 line 
        | indent5 line ;
/*--------- DATA TYPES -------------*/
data_type: CHAR
        | INTEGER 
        | IDENTIFIER;

/*--------- FUNCTIONS --------------*/
function: FUNCTION IDENTIFIER L_PAR optional_parameters R_PAR ;
end_function: ENDFUNCTION NEWLINE;

function_call: IDENTIFIER L_PAR optional_parameters R_PAR 
            | IDENTIFIER L_PAR data_type R_PAR
                | IDENTIFIER L_PAR data_type COMMA data_type R_PAR    
                | IDENTIFIER L_PAR data_type COMMA data_type COMMA data_type R_PAR;

/*------------ INSPECTORS -------------*/
inspector:IDENTIFIER operators IDENTIFIER
        |IDENTIFIER operators INTEGER
        |INTEGER operators IDENTIFIER
        |INTEGER operators INTEGER   ;

inspector_gen: inspector | inspector AND_OR_operators;

/*----------- IF & FOR STATEMENTS -------------*/

if_stmt:IF L_PAR inspector_gen R_PAR THEN NEWLINE action  ;
elseif_stmt: ELSEIF L_PAR inspector_gen R_PAR NEWLINE action  ;
else_stmt: ELSE NEWLINE action  ;
end_if_stmt:ENDIF NEWLINE  ;

for_statement: FOR IDENTIFIER COLON ASSIGN INTEGER TO INTEGER STEP INTEGER NEWLINE action;
end_for_statement: ENDFOR NEWLINE;


/*---------- SWITCH / CASE STATEMENT -----------------*/
switch: SWITCH L_PAR LT IDENTIFIER GT R_PAR NEWLINE action;

case: CASE L_PAR LT INTEGER GT R_PAR NEWLINE action;

end_switch: ENDSWITCH NEWLINE;

/*-------------- WHILE ---------------*/
while: WHILE L_PAR inspector_gen R_PAR NEWLINE action  ;
end_wile: ENDWHILE NEWLINE;

/*-------------- OPERATORS ---------------*/
operators:EQ_OP 
      | GE_OP 
      | LE_OP 
      | NE_OP 
      | DEC_OP 
      | INC_OP 
      | LT 
      | GT;

AND_OR_operators:AND_OP
        |OR_OP;

optional_parameters: IDENTIFIER 
        | optional_parameters COMMA IDENTIFIER ;

/*-------------- COMMENTS ---------------*/
comments: COMMENT;

/*-------------- PRINT ---------------*/
print: PRINT L_PAR data_type R_PAR QM;

/*-------------- MAIN ---------------*/
start_main: STARTMAIN NEWLINE action;
end_main: ENDMAIN NEWLINE  ;

/* --- DICTIONARIES --- */
dictionaries: IDENTIFIER ASSIGN L_BRACE dictionary_data R_BRACE 
        | IDENTIFIER ASSIGN IDENTIFIER L_PAR L_BRACK L_PAR dictionary_data R_PAR R_BRACK R_PAR
     IDENTIFIER ASSIGN IDENTIFIER L_PAR dictionary_data optional_parameters dictionary_data R_PAR ;

dictionary_data: data_type COLON data_type 
        |data_type COLON data_type COMMA dictionary_data 
        | data_type COMMA data_type optional_parameters 
        | IDENTIFIER ASSIGN data_type | /* empty */ ;

/* --- CALCULATE --- */
calc_assignment: IDENTIFIER ASSIGN int_op { Change($1, $3); };
    
int_op: int_data { $$ = $1; }
    | int_op PLUS int_data { $$ = $1 + $3; }
    | int_op MINUS int_data { $$ = $1 - $3; }
    | int_op MUL int_data { $$ = $1 * $3; }
    | int_op DIV int_data { $$ = $1 / $3; } ;

int_data: INTEGER { $$ = $1; } 
        | IDENTIFIER { $$ = Search($1) -> integer_val; };

%%

/* ------------------------------------------------ C FUNCTIONS -------------------------------------------- */

void yyerror(char *message){
    printf("Error: \"%s\"\t in line %d. Token = %s\n", message, line_no, yytext);
    exit(1);
}   

/* ------------------------------------------ MAIN FUNCTION --------------------------------------------- */

int main(int argc, char *argv[]){

    hashTable = (hash *) calloc(SIZE, sizeof(hash));

        int flag;

    yyin = fopen(argv[1],"r");
    //yyparse(): reads tokens, executes actions
    flag = yyparse();
    fclose(yyin);

    printf("Parsing finished succesfully!\n\n");
    printf(" __________________________\n");
    Print();
    printf(" __________________________\n");

    return flag;   
}

I am stuck and don't know what to do. The compiler just does not like my code:

parser.y: warning: 9 nonterminals useless in grammar [-Wother]
parser.y: warning: 9 rules useless in grammar [-Wother]
parser.y:136.1-5: warning: nonterminal useless in grammar: break [-Wother]
  136 | break:BREAK QM NEWLINE ;
      | ^~~~~
parser.y:157.1-12: warning: nonterminal useless in grammar: end_function [-Wother]
  157 | end_function: ENDFUNCTION NEWLINE;
      | ^~~~~~~~~~~~
parser.y:177.1-11: warning: nonterminal useless in grammar: end_if_stmt [-Wother]
  177 | end_if_stmt:ENDIF NEWLINE  ;
      | ^~~~~~~~~~~
parser.y:180.1-17: warning: nonterminal useless in grammar: end_for_statement [-Wother]
  180 | end_for_statement: ENDFOR NEWLINE;
      | ^~~~~~~~~~~~~~~~~
parser.y:188.1-10: warning: nonterminal useless in grammar: end_switch [-Wother]
  188 | end_switch: ENDSWITCH NEWLINE;
      | ^~~~~~~~~~
parser.y:191.1-5: warning: nonterminal useless in grammar: while [-Wother]
  191 | while: WHILE L_PAR inspector_gen R_PAR NEWLINE action  ;
      | ^~~~~
parser.y:192.1-8: warning: nonterminal useless in grammar: end_wile [-Wother]
  192 | end_wile: ENDWHILE NEWLINE;
      | ^~~~~~~~
parser.y:217.1-10: warning: nonterminal useless in grammar: start_main [-Wother]
  217 | start_main: STARTMAIN NEWLINE action;
      | ^~~~~~~~~~
parser.y:218.1-8: warning: nonterminal useless in grammar: end_main [-Wother]
  218 | end_main: ENDMAIN NEWLINE  ;
      | ^~~~~~~~
parser.y: warning: 48 shift/reduce conflicts [-Wconflicts-sr]
parser.y: warning: 68 reduce/reduce conflicts [-Wconflicts-rr]
parser.y:141.10-29: warning: rule useless in parser due to conflicts [-Wother]
  141 | indent3: INDENT INDENT INDENT;
      |          ^~~~~~~~~~~~~~~~~~~~
parser.y:142.10-36: warning: rule useless in parser due to conflicts [-Wother]
  142 | indent4: INDENT INDENT INDENT INDENT;
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
parser.y:143.10-43: warning: rule useless in parser due to conflicts [-Wother]
  143 | indent5: INDENT INDENT INDENT INDENT INDENT;

I know that I have done something completely wrong. Please help me! I don't know how to move on.

When you are starting out with Bison (and, really, any time you are using it), you are best off writing and debugging your grammar in small pieces. That's a good habit for any project in any programming language, but it's particularly true when you lack experience. Don't implement all the operators, just implement a couple of them. Once you have that working, you can add the rest. Similarly, don't implement every statement syntax. Start with one, get that working, and then add another one. It's much easier to find an error when the haystack is not very big. Once you make that a habit, you'll find that programming actually becomes much easier.

Now, your actual problem. A non-terminal is "useless" if it's never used. In other words, if you define a non-terminal and don't ever use it in some production, bison will warn you that it was pointless to define that non-terminal. And Bison is clever enough to do that analysis recursively: if the only place a non-terminal appears is on the right-hand side of a useless non-terminal, that non-terminal is also useless, and you'll get a warning for it, too. (I don't think that's an issue here, but I didn't do an extensive analysis of your code.)

So, for example, nowhere in your grammar do you do anything with the non-terminal break other than define it as break:BREAK QM NEWLINE ; . I suppose you intend to add it to your statement alternatives later on, in which case you could just ignore the warning (which is why it is a warning and not an error). But, on the whole, you would have created less noise by not adding break to your grammar until you were ready to add its use as well.

Now, the shift/reduce conflicts. Unless you're lucky enough to stumble upon an obvious issue, it's really hard to figure out what causes a shift/reduce conflict without seeing the actual states with conflicts; Bison will produce a report of these states if you use the -v command-line option. There's useful information on debugging conflicts in John Levine's excellent book.

The latest Bison versions can help you even more by producing counterexamples . There's another good explanation of conflicts in the Bison manual , and some examples which explain how to use this new feature.

But, as it happens, I did stumble upon one obvious error. You have (in part) the following productions:

line: action | print NEWLINE
action: INDENT line | indent2 line
indent2: INDENT INDENT

There's a lot more, but that's enough to create a conflict. Leaving aside what constitutes an INDENT token, and just noting that print starts with the token PRINT , suppose we have the following input:

INDENT INDENT PRINT

Now, how can your grammar derive that? It could do this:

line -> action -> INDENT line -> INDENT action 
     -> INDENT INDENT line -> INDENT INDENT print NEWLINE

Or it could do this:

line -> action -> indent2 line -> INDENT INDENT line
     -> INDENT INDENT print NEWLINE

(As I hope you know, a derivation step consists of replacing a non-terminal with one of its right-hand sides. So the above is two different derivations for the same input, which means your grammar is ambiguous . Bison insists on producing a definitive parse -- that's its entire purpose -- and if there are two possible parses for the same input, it can't do that.

Or, more precisely, it can do that, by picking which parse to use with the aid of some rules. But those rules often don't work as expected, and with an ambiguous grammar there is really no way for anyone other than the grammar's author to know which parse was intended. So Bison warns you that you have shift/reduce conflicts, and then uses its built-in rules to choose one possible parsing strategy.

Frequently, as with your grammar, when Bison applies these rules it finds that certain productions will no longer apply to any input (because the disambiguation rules chose some other production to apply). In that case, the eliminated productions become useless, and that's almost certainly an error, so Bison generates a warning about that, too.

I don't know if that's the cause of all the conflicts, but it would be good to fix that problem, and then see what is left.

It doesn't seem to me like your intent is to write a Python-like language where layout determines block structure, since you seem to be defining explicit end tokens for all your block syntaxes. It's not possible to use a context-free grammar to enforce correct indentation, so I hope that wasn't your intent.

The most usual parsing technique, for languages like C which don't consider layout as part of the grammar, is for the lexical scanner to simply skip over whitespace (tabs and spaces); since the whitespace makes no difference to the parse, there's no point confusing the grammar by forcing it to consider where the whitespace might go. That's certainly what I would suggest, but since I really have no idea what your intent was, I can't really say any more.

Good luck with the project.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM