简体繁体 English

在Flex中匹配精确词以进行词法分析

[英]Matching exact word in Flex for lexical analysis

原文 2014-07-01 18:11:36 4 1 regex/ flex-lexer

I am trying to match an exact word for lexical analysis in Flex. 我正在尝试为Flex中的词法分析匹配一个确切的单词。 Concretely, I am looking for the word class , and have tried this regex \\bclass\\b which works fine at rubular. 具体来说，我正在寻找单词class ，并尝试了此正则表达式\\bclass\\b ，它在rubular上可以正常工作。 I have already tried the example given here but for some reason Flex fails to emulate the output at rubular . 我已经尝试过此处给出的示例，但是由于某种原因，Flex无法在rubular上模拟输出。 Can you explain why? 你能解释为什么吗？ And how exactly to do it? 以及如何做到这一点？

1 个解决方案

Simply put, because flex is flex and www.rubular.com is "a Ruby regular expression editor" (quoting from home page, emphasis added). 简而言之，因为flex是flex而www.rubular.com是“ Ruby正则表达式编辑器”（引用自主页，已强调）。

Flex regular expression syntax is explained in the flex manual ; Flex正则表达式的语法在flex手册中进行了说明； if you read that, you will see that for flex interprets \\b as in standard C; 如果您读过这篇文章，将会看到for flex将\\b解释为标准C中的内容； that is, as a backspace character. 也就是说，作为退格字符。

It's important to understand how Flex works (for which it is useful to read an introductory guide, for example in the manual); 了解Flex的工作原理非常重要（例如，阅读手册中的入门指南会很有用）； in summary, it successively matches tokens, each one occurring immediately after the previous one. 总而言之，它会连续匹配令牌，每个令牌都紧接在前一个令牌之后。 It should not be necessary to provide boundary assertions because the previous text will already have been matched up to the start of the word, and there should be an explicit token pattern which will match longer words which happen to start with the target word's letters. 不必提供边界断言，因为先前的文本将一直匹配到单词的开头，并且应该有一个显式的标记模式，该模式将匹配恰好以目标单词的字母开头的更长的单词。 Take a look at the example in the Wikipedia article , for example. 例如，看一下Wikipedia文章中的示例。