Lex：我如何防止它与子串相匹配？

Question

For example, I'm supposed to convert "int" to "INT". 例如，我应该将“int”转换为“INT”。 But if there's the word "integer", I don't think it's supposed to turn into "INTeger". 但如果有“整数”这个词，我认为它不应该变成“INTeger”。

If I define "int" printf("INT"); 如果我定义"int" printf("INT"); the substrings are matched though. 但是，子串匹配。 Is there a way to prevent this from happening? 有没有办法防止这种情况发生？

Answer 1

I believe the following captures what you want. 我相信以下内容可以捕获您想要的内容。

%{
#include <stdio.h>
%}

ws                      [\t\n ]

%%

{ws}int{ws}         { printf ("%cINT%c", *yytext, yytext[4]); }
.                       { printf ("%c", *yytext); }

To expand this beyond word boundaries ( {ws} , in this case) you will need to either add modifiers to ws or add more specifc checks. 要扩展超出单词边界（在本例中为{ws} ），您需要向ws添加修饰符或添加更多特定的检查。

Answer 2

well, here's how i did it: 好吧，这是我怎么做的：

(("int"([a-z]|[A-Z]|[0-9])+)|(([a-z]|[A-Z]|[0-9])+"int")) ECHO;
"int" printf("INT");

better suggestions welcome. 更好的建议欢迎。

Answer 3

Lex will choose the rule with the longest possible match for the current input. Lex将选择与当前输入匹配最长的规则。 To avoid substring matches you need to include an additional rule that is longer than int . 要避免子字符串匹配，您需要包含一个比int更长的附加规则。 The easiest way to do to this is to add a simple rule that picks up any string that is longer than one character, ie [a-zA-Z]+ . 最简单的方法是添加一个简单的规则来获取长于一个字符的任何字符串，即[a-zA-Z]+ 。 The entire lex program would look like this:- 整个lex程序看起来像这样： -

%%

[\t ]+          /* skip whitespace */
int { printf("INT"); }
[a-zA-Z]+       /* catch-all to avoid substring matches */

%%

int main(int argc, char *argv[])
   {
   yylex();
   }

Lex：我如何防止它与子串相匹配？

问题描述

3 个解决方案

解决方案1
2 已采纳 2010-03-02 01:30:53

解决方案2
1 2010-03-01 21:46:42

解决方案3
1 2010-03-02 01:18:37

Lex：我如何防止它与子串相匹配？

问题描述

3 个解决方案

解决方案1 2 已采纳 2010-03-02 01:30:53

解决方案2 1 2010-03-01 21:46:42

解决方案3 1 2010-03-02 01:18:37

解决方案1
2 已采纳 2010-03-02 01:30:53

解决方案2
1 2010-03-01 21:46:42

解决方案3
1 2010-03-02 01:18:37