简体   繁体   English

如何在Flex(词法分析器)中定义数字格式?

[英]How to define numbers format in Flex (lexical analyzer)?

What I need : 我需要的 :

Acceptable > 1234 & 12.34 可接受> 123412.34

Error (Non acceptable) > 12.34.56 错误(不可接受)> 12.34.56

Scanner.L : Scanner.L:

      ...
%%

[0-9]+                printf("Number ");
[0-9]+"."[0-9]+       printf("Decimal_Number ");
"."                   printf("Dot "):

%%
      ...

After compile & run : 编译运行后:

Input :
1234    12.34    12.34.65

Output :
Number    Decimal_Number      Decimal_Number Dot Number

How to print Error instead of Decimal_Number Dot Number (Or just ignore it) ? 如何打印Error而不是Decimal_Number Dot Number (或忽略它)?

Is it possible to define space before & after number as seperator ? 是否可以在数字前后定义space作为分隔符?

It's often considered better to detect errors like 12.34.56 in your parser rather than your scanner. 通常认为在解析器而不是扫描器中检测到12.34.56错误会更好。 But there is also an argument that you can produce better error messages by detecting the error lexically. 但是,还有一个论点是,通过词法检测错误可以产生更好的错误消息。

If you want to do so, you can use two patterns; 如果要这样做,可以使用两种模式。 the first one detects only correct numbers and the second one detects a larger set of strings, including all the erroneous strings (but not anything which could be legitimate). 第一个仅检测正确的数字,第二个仅检测较大的字符串集,包括所有错误的字符串(但不包括任何合法字符串)。 This relies on the matching behaviour of (f)lex: it always accepts the longest match, and if the longest token is matched by two or more rules, it uses the first matching rule. 这取决于(f)lex的匹配行为:它始终接受最长匹配,并且如果最长令牌被两个或多个规则匹配,则使用第一个匹配规则。

For example, suppose you wanted to accept dots by themselves as '.' 例如,假设您想自己接受点作为'.' , numbers as NUMBER tokens, and produce an error on numeric strings with more than one dot. ,将数字作为NUMBER令牌,并在带有多个点的数字字符串上产生错误。 You could do that with three rules: 您可以使用以下三个规则来做到这一点:

  /* If the token is just a dot, match it here */
\.                             { return '.';    }
  /* Match integers without decimal points */
[[:digit:]]+                   { return INTEGER; }
  /* If the token is a number including a decimal point,
   * match it here. This pattern will also match just '.',
   * but the previous rules will be preferred.) */
[[:digit:]]*\.[[:digit:]]*     { return FLOAT; }
  /* This rule matches any sequence of dots and digits.
   * That will also match single dots and correct numbers, but
   * again, the previous rules are preferred. */
[.[:digit:]]+                  { /* signal error */
                                 return BADNUMBER; }

You need to be very careful with solutions like the above. 您需要对上述解决方案非常小心。 For example, the last rule will match .. and ... , which might be valid tokens (or even valid sequences of . tokens.) 例如,最后一条规则将匹配..... ,这可能是有效的令牌(甚至是有效的.令牌序列)。

Suppose, for example, that your language permits "range" expressions like 4 .. 17 (meaning the list of integers from 4 to 17, or some such). 例如,假设您的语言允许使用“范围”表达式,例如4 .. 17 (表示从4到17的整数列表,等等)。 Your users might expect 4..17 to be accepted as a range expression, but the above will produce a BADNUMBER error, even when you add the rule 您的用户可能希望4..17被接受为范围表达式,但是即使您添加规则,以上内容也会产生BADNUMBER错误

".."                           { return RANGE; }

at the beginning, because 4.. will match BADNUMBER at a previous point in the scan. 一开始是因为4..将在扫描的前一点匹配BADNUMBER

In order to avoid false alerts, we need to modify the BADNUMBER rule to avoid matching strings which include two (or more) consecutive dots. 为了避免错误警报,我们需要修改BADNUMBER规则,以避免匹配包含两个(或更多)连续点的字符串。 And we also need to make sure that 4..17 is not lexed as 4. followed by .17 . 并且我们还需要确保4..17不被词汇化为4.后跟.17 (This second problem could be avoided by insisting that . neither start not end a numeric token, but that might annoy some users.) (可以通过坚持.既不开始也不结束数字令牌来避免第二个问题,但这可能会惹恼某些用户。)

So, we start with the actual dot tokens: 因此,我们从实际的点标记开始:

"."                            { return '.'; }
".."                           { return RANGE; }
"..."                          { return ELLIPSIS; }

To avoid overmatching a number followed by .. , we can use flex's trailing context operator. 为了避免与..后面的数字过度匹配,我们可以使用flex的尾随上下文运算符。 Here, we recognize a sequence of digits terminated by a . 在这里,我们认识到以终止的数字序列. as a number only if the string is followed by something other than a . 仅当字符串后面跟有除.以外的其他字符时,才作为数字. :

[[:digit:]]+                   { return INTEGER; }
  /* Change * to + so that we don't do numbers ending with . */
[[:digit:]]*(\.[[:digit:]]+)?  { return FLOAT; }
  /* Numbers which end with dot not followed by dot */
[[:digit:]]+\./[^.]            { return FLOAT; }

Now we need to fix the error rule. 现在我们需要修复错误规则。 First, we limit it to recognizing strings where every dot is followed by a digit. 首先,我们将其限制为识别字符串,其中每个点后面都带有一个数字。 Then, similar to the above, we do match the case where there is a trailing dot not followed by another dot: 然后,类似于上述内容,我们确实匹配以下情况:尾随点后面没有另一个点:

[[:digit:]]*(\.[[:digit:]]+)+  { return BADNUMBER; }
[[:digit:]]*(\.[[:digit:]]+)+\./[^.] { return BADNUMBER; }

That's not the duty of the lexer, but of the parser (yacc or bison). 这不是词法分析器的职责,而是解析器(yacc或野牛)的职责。 If you define . 如果定义. as a valid symbol then there is no surprise that 作为有效的符号,那么毫不奇怪

12.34.56

is tokenized as 被标记为

Decimal_Number Dot Number

The point is that the parser won't have a rule that accepts that sequence of token so the error will be raised later. 关键是解析器不会有接受令牌顺序的规则,因此稍后会引发错误。 White space is usually ignored so forcing a space between numbers won't make sense, especially in a context where you could have 12.34+56.78 that won't be tokenized as Decimal_Number Binary_Operator Decimal_Number because it lacks white space. 通常会忽略空格,因此在数字之间强制使用空格是没有意义的,尤其是在您可能拥有12.34+56.78不会被标记为Decimal_Number Binary_Operator Decimal_Number因为它缺少空格。

You can check my procedure to handle your problem. 您可以检查我的程序来解决您的问题。 But as you are trying with lex you should know that whenever it matches any case then it work. 但是,当您尝试使用lex您应该知道,只要它与任何大小写匹配都可以使用。 Now change as like as follows: 现在更改如下:

%%

[0-9]+                {printf("Number ");}
[0-9]+[.][0-9]*[.]+[0-9|.]*       {printf("error ");}
[0-9]+[.][0-9]+       {printf("Decimal_Number ");}
%%

Now the program works as you want. 现在,该程序可以根据需要运行。

Input :
1234    12.34    12.34.65

Output :
Number    Decimal_Number     Error

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM