[英]Why does this bison code produce unexpected output?
flex code:弹性代码:
1 %option noyywrap nodefault yylineno case-insensitive
2 %{
3 #include "stdio.h"
4 #include "tp.tab.h"
5 %}
6
7 %%
8 "{" {return '{';}
9 "}" {return '}';}
10 ";" {return ';';}
11 "create" {return CREATE;}
12 "cmd" {return CMD;}
13 "int" {yylval.intval = 20;return INT;}
14 [a-zA-Z]+ {yylval.strval = yytext;printf("id:%s\n" , yylval.strval);return ID;}
15 [ \t\n]
16 <<EOF>> {return 0;}
17 . {printf("mistery char\n");}
18
bison code:野牛代码:
1 %{
2 #include "stdlib.h"
3 #include "stdio.h"
4 #include "stdarg.h"
5 void yyerror(char *s, ...);
6 #define YYDEBUG 1
7 int yydebug = 1;
8 %}
9
10 %union{
11 char *strval;
12 int intval;
13 }
14
15 %token <strval> ID
16 %token <intval> INT
17 %token CREATE
18 %token CMD
19
20 %type <strval> col_definition
21 %type <intval> create_type
22 %start stmt_list
23
24 %%
25 stmt_list:stmt ';'
26 | stmt_list stmt ';'
27 ;
28
29 stmt:create_cmd_stmt {/*printf("create cmd\n");*/}
30 ;
31
32 create_cmd_stmt:CREATE CMD ID'{'create_col_list'}' {printf("%s\n" , $3);}
33 ;
34 create_col_list:col_definition
35 | create_col_list col_definition
36 ;
37
38 col_definition:create_type ID ';' {printf("%d , %s\n" , $1, $2);}
39 ;
40
41 create_type:INT {$$ = $1;}
42 ;
43
44 %%
45 extern FILE *yyin;
46
47 void
48 yyerror(char *s, ...)
49 {
50 extern yylineno;
51 va_list ap;
52 va_start(ap, s);
53 fprintf(stderr, "%d: error: ", yylineno);
54 vfprintf(stderr, s, ap);
55 fprintf(stderr, "\n");
56 }
57
58 int main(int argc , char *argv[])
59 {
60 yyin = fopen(argv[1] , "r");
61 if(!yyin){
62 printf("open file %s failed\n" ,argv[1]);
63 return -1;
64 }
65
66 if(!yyparse()){
67 printf("parse work!\n");
68 }else{
69 printf("parse failed!\n");
70 }
71
72 fclose(yyin);
73 return 0;
74 }
75
test input file:测试输入文件:
create cmd keeplive
{
int a;
int b;
};
test output:测试输出:
root@VM-Ubuntu203001:~/test/tpp# ./a.out t1.tp
id:keeplive
id:a
20 , a;
id:b
20 , b;
keeplive
{
int a;
int b;
}
parse work!
I have two questions:我有两个问题:
1) Why does the action at line 38 print the token ';'? 1) 为什么第 38 行的操作会打印标记 ';'? For instance, "20 , a;"例如,“20 , a;” and "20 , b;"和“20,b;”
2) Why does the action at line 32 print "keeplive { int a; int b; }" instead of simply "keeplive"? 2) 为什么第 32 行的操作打印“keeplive { int a; int b; }”而不是简单的“keeplive”?
Short answer:简短的回答:
yylval.strval = yytext;
You can't use yytext
like that.你不能像那样使用yytext
。 The string it points to is private to the lexer and will change as soon as the flex action finishes.它指向的字符串是词法分析器私有的,并且会在 flex 操作完成后立即更改。 You need to do something like:您需要执行以下操作:
yylval.strval = strdup(yytext);
and then you need to make sure you free the memory afterwards.然后你需要确保你之后释放了内存。
Longer answer:更长的答案:
yytext
is actually a pointer into the buffer containing the input. yytext
实际上是指向包含输入的缓冲区的指针。 In order to make yytext work as though it were a NUL-terminated string, the flex
framework overwrites the character following the token with a NUL
before it does the action, and then replaces the original character when the action terminates.为了使 yytext 像以 NUL 结尾的字符串一样工作, flex
框架在执行操作之前用NUL
覆盖标记后面的字符,然后在操作终止时替换原始字符。 So strdup
will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token.因此strdup
在动作内部可以正常工作,但在动作外部(在您的野牛代码中),您现在有一个指向以令牌开头的缓冲区部分的指针。 And it gets worse later, since flex
will read the next part of the source into the same buffer, and now your pointer is to random garbage.后来情况变得更糟,因为flex
会将源的下一部分读入同一个缓冲区,现在您的指针指向随机垃圾。 There are several possible scenarios, depending on flex
options, but none of them are pretty.有几种可能的情况,具体取决于flex
选项,但没有一个是漂亮的。
So the golden rule: yytext
is only valid until the end of the action.所以黄金法则: yytext
只在动作结束前有效。 If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.如果您想保留它,请复制它,然后确保在您不再需要它时释放存储空间用于该副本。
In almost all the lexers I've written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management.在我编写的几乎所有词法分析器中,ID 令牌实际上是在符号表中找到标识符(或将其放在那里)并返回一个指向符号表的指针,这简化了内存管理。 But you still have essentially the same memory management issue with, for example, character string literals.但是您仍然有本质上相同的内存管理问题,例如,字符串文字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.