简体繁体 English

正则表达式在词法分析中的使用

[英]Use of regex in lexical analysis

原文 2013-06-02 07:30:52 9 1 regex/ parsing/ compiler-construction/ lexical-analysis

I am trying to understand the way bottom-up parsing is implemented. 我试图了解自底向上解析的实现方式。

I got it to the point where regular expressions are converted to NFAs and then to DFAs and how a DFA is represented as a two dimensional table. 我了解到正则表达式先转换为NFA然后再转换为DFA的地步，以及DFA如何表示为二维表。

The question is wouldn't that be a quite large table with everything in the alphabet listed in one side of it? 问题是，这不是一个很大的表，而字母的所有内容都在表的一侧列出了吗？ Is it really the way it is supposed to be implemented? 这真的是应该实施的方式吗？

The other question is as far as I know most languages have some Regex implementation out of the box. 据我所知，另一个问题是大多数语言都有一些Regex实现。 Can those Regex utilities be used as some ready implementation of lexical analysis part and then one can go on directly to make the a parse table out the output? 那些Regex实用程序可以用作词法分析部分的一种现成的实现，然后可以直接继续进行解析表输出吗？

1 个解决方案

I am trying to understand the way bottom-up parsing is implemented. 我试图了解自底向上解析的实现方式。

No you aren't. 不，你不是。 Your question is about lexical analysis. 您的问题是关于词法分析的。 Nothing to do with parsing. 与解析无关。

I got it to the point where regular expressions are converted to NFAs and then to DFAs and how a DFA is represented as a two dimensional table. 我了解到正则表达式先转换为NFA然后再转换为DFA的地步，以及DFA如何表示为二维表。

Again this has nothing to do with parsing. 同样，这与解析无关。 It has little to do with actual lexical analysis either. 它也与实际的词法分析无关。 It is a question about lexical analyzer generation . 这是关于词法分析器生成的问题。 You are now two steps removed from your stated subject. 现在，您从声明的主题中删除了两个步骤。

The question is wouldn't that be a quite large table with everything in the alphabet listed in one side of it? 问题是，这不是一个很大的表，而字母的所有内容都在表的一侧列出了吗？

It would be as large as it needs to be to represent the DFA, which in turn depends on the rules you specify. 表示DFA所需的大小将足够大，而DFA则取决于您指定的规则。 Not a real question. 这不是一个真正的问题。

Is it really the way it is supposed to be implemented? 这真的是应该实施的方式吗？

There are lots of ways to represent a DFA. 有很多方法可以代表DFA。 flex(1) provides three or four options, for example, each with a different space/time tradeoff. flex（1）提供三个或四个选项，例如，每个选项具有不同的时空权衡。 You would almost certainly start by implementing character classes, which would eliminate 'everything in the alphabet listed in one side of it' immediately. 几乎可以肯定，您将从实现字符类开始，这将立即消除“字母一侧列出的所有内容”。

The other question is as far as I know most languages have some Regex implementation out of the box. 据我所知，另一个问题是大多数语言都有一些Regex实现。 Can those Regex utilities be used as some ready implementation of lexical analysis part and then one can go on directly to make the a parse table out the output? 那些Regex实用程序可以用作词法分析部分的一种现成的实现，然后可以直接继续进行解析表输出吗？

Again, parsing has nothing to do with lexical analysis. 再次，解析与词法分析无关。
A DFA already is a 'ready implementation of lexical analysis'. DFA 已经是 “词法分析的现成实现”。
As per @Qtax's comment, a single DFA for the entire ruleset is a lot faster than a series of regular expressions. 根据@Qtax的评论，整个规则集的单个DFA比一系列正则表达式要快得多。 It is almost certainly more compact as well. 几乎可以肯定，它也更紧凑。