简体   繁体   English

为什么这个野牛代码会产生意外的输出?

[英]Why does this bison code produce unexpected output?

flex code:弹性代码:

  1 %option noyywrap nodefault yylineno case-insensitive
  2 %{
  3 #include "stdio.h"
  4 #include "tp.tab.h"
  5 %}
  6 
  7 %%
  8 "{"             {return '{';}
  9 "}"             {return '}';}
 10 ";"             {return ';';}
 11 "create"        {return CREATE;}
 12 "cmd"           {return CMD;}
 13 "int"           {yylval.intval = 20;return INT;}
 14 [a-zA-Z]+       {yylval.strval = yytext;printf("id:%s\n" , yylval.strval);return ID;}
 15 [ \t\n]
 16 <<EOF>>         {return 0;}
 17 .               {printf("mistery char\n");}
 18 

bison code:野牛代码:

  1 %{
  2 #include "stdlib.h"
  3 #include "stdio.h"
  4 #include "stdarg.h"
  5 void yyerror(char *s, ...);
  6 #define YYDEBUG 1
  7 int yydebug = 1;
  8 %}
  9 
 10 %union{
 11     char *strval;
 12     int intval;
 13 }
 14 
 15 %token <strval> ID
 16 %token <intval> INT
 17 %token CREATE
 18 %token CMD
 19 
 20 %type <strval> col_definition
 21 %type <intval> create_type
 22 %start stmt_list
 23 
 24 %%
 25 stmt_list:stmt ';'
 26 | stmt_list stmt ';'
 27 ;
 28 
 29 stmt:create_cmd_stmt         {/*printf("create cmd\n");*/}
 30 ;
 31 
 32 create_cmd_stmt:CREATE CMD ID'{'create_col_list'}'    {printf("%s\n" , $3);}
 33 ;
 34 create_col_list:col_definition
 35 | create_col_list col_definition
 36 ;
 37 
 38 col_definition:create_type ID ';' {printf("%d , %s\n" , $1, $2);}
 39 ;
 40 
 41 create_type:INT {$$ = $1;}
 42 ;
 43 
 44 %%
 45 extern FILE *yyin;
 46 
 47 void
 48 yyerror(char *s, ...)
 49 {
 50     extern yylineno;
 51     va_list ap;
 52     va_start(ap, s);
 53     fprintf(stderr, "%d: error: ", yylineno);
 54     vfprintf(stderr, s, ap);
 55     fprintf(stderr, "\n");
 56 }
 57 
 58 int main(int argc , char *argv[])
 59 {
 60     yyin = fopen(argv[1] , "r");
 61     if(!yyin){
 62         printf("open file %s failed\n" ,argv[1]);
 63         return -1;
 64     }
 65 
 66     if(!yyparse()){
 67         printf("parse work!\n");
 68     }else{
 69         printf("parse failed!\n");
 70     }
 71 
 72     fclose(yyin);
 73     return 0;
 74 }
 75

test input file:测试输入文件:

create cmd keeplive
{
    int a;
    int b;
};

test output:测试输出:

root@VM-Ubuntu203001:~/test/tpp# ./a.out t1.tp 
id:keeplive
id:a
20 , a;
id:b
20 , b;
keeplive
{
    int a;
    int b;
}
parse work!

I have two questions:我有两个问题:

1) Why does the action at line 38 print the token ';'? 1) 为什么第 38 行的操作会打印标记 ';'? For instance, "20 , a;"例如,“20 , a;” and "20 , b;"和“20,b;”

2) Why does the action at line 32 print "keeplive { int a; int b; }" instead of simply "keeplive"? 2) 为什么第 32 行的操作打印“keeplive { int a; int b; }”而不是简单的“keeplive”?

Short answer:简短的回答:

yylval.strval = yytext;

You can't use yytext like that.你不能像那样使用yytext The string it points to is private to the lexer and will change as soon as the flex action finishes.它指向的字符串是词法分析器私有的,并且会在 flex 操作完成后立即更改。 You need to do something like:您需要执行以下操作:

yylval.strval = strdup(yytext);

and then you need to make sure you free the memory afterwards.然后你需要确保你之后释放了内存。


Longer answer:更长的答案:

yytext is actually a pointer into the buffer containing the input. yytext实际上是指向包含输入的缓冲区的指针。 In order to make yytext work as though it were a NUL-terminated string, the flex framework overwrites the character following the token with a NUL before it does the action, and then replaces the original character when the action terminates.为了使 yytext 像以 NUL 结尾的字符串一样工作, flex框架在执行操作之前用NUL覆盖标记后面的字符,然后在操作终止时替换原始字符。 So strdup will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token.因此strdup在动作内部可以正常工作,但在动作外部(在您的野牛代码中),您现在有一个指向以令牌开头的缓冲区部分的指针。 And it gets worse later, since flex will read the next part of the source into the same buffer, and now your pointer is to random garbage.后来情况变得更糟,因为flex会将源的下一部分读入同一个缓冲区,现在您的指针指向随机垃圾。 There are several possible scenarios, depending on flex options, but none of them are pretty.有几种可能的情况,具体取决于flex选项,但没有一个是漂亮的。

So the golden rule: yytext is only valid until the end of the action.所以黄金法则: yytext只在动作结束前有效。 If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.如果您想保留它,请复制它,然后确保在您不再需要它时释放存储空间用于该副本。

In almost all the lexers I've written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management.在我编写的几乎所有词法分析器中,ID 令牌实际上是在符号表中找到标识符(或将其放在那里)并返回一个指向符号表的指针,这简化了内存管理。 But you still have essentially the same memory management issue with, for example, character string literals.但是您仍然有本质上相同的内存管理问题,例如,字符串文字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM