简体   繁体   English

建立正则表达式和有限自动机

[英]Build a Regular Expression and Finite Automata

I need some help understanding how take the following to make a regular expression that will be used to generate an epsilon NFA. 我需要一些帮助,以帮助您理解以下内容如何制作正则表达式,以用于生成epsilon NFA。

Alphabet is {0,1} 字母为{0,1}

Language is: The set of all strings beginning with 101 and ending with 01010. 语言是:所有以101开头并以01010结尾的字符串的集合。

Valid strings would be: 有效字符串为:

  • 101010 101010
  • 10101010 10101010
  • 101110101 101110101
  • 1011101010 1011101010

I am more concerned with understanding how to make the regular expression. 我更关心了解如何制作正则表达式。

The regular expression you need is pretty simple: 您需要的正则表达式非常简单:

101010|101(0|1)*01010 (theoretical)

or 要么

^101010|101[01]*01010$ (used in most programming languages)

which means either: 这意味着:

  • Match 1, 0, 1, 0, 1, 0 匹配1,0,1,0,1,0

or 要么

  • Match 1, 0, and 1. 匹配1、0和1。
  • Keep matching 0 or 1, zero or more times. 保持匹配0或1,零次或更多次。
  • Match 0, 1, 0, 1, 0. 匹配0、1、0、1、0。

The following non-deterministic automata should work: 下列非确定性自动机应该起作用:

在此处输入图片说明

To get an idea of what you are looking for, it is helpful to use the intersection operator (denoted & below). 要了解您要寻找的内容,请使用交集运算符(在下面和下方表示)很有帮助。 It does not belong to the core set of rational expressions, yet it preserves rationality --- in other words, you can use it, and always find a means to express the same language without it. 它不属于理性表达的核心集,但是保留了理性-换句话说,您可以使用它,并且总是找到一种在没有它的情况下表达相同语言的方法。

Using Vcsn , I get this in text mode: 使用Vcsn ,我可以在文本模式下得到它:

In [1]: import vcsn

In [2]: vcsn.B.expression('(101[01]*)&([01]*01010)').derived_term().expression()
Out[2]: 101010+101(0+1)*01010

and this in graphical mode, showing the intermediate automaton computed using derived_term (which includes details about the "meaning" of each state, so strip called afterwards to get something simpler to read): 并以图形方式显示,显示了使用derived_term计算的中间自动机(其中包括有关每个状态的“含义”的详细信息,因此,随后调用strip以使内容更易于阅读):

自动机的图形渲染

I'd suggest a pattern that includes both the base-case and general case. 我建议一种既包括基本情况又包括一般情况的模式。 You need to cover the base case of 101010, where the two patterns overlap (starts with "101", ends with "01010", and the last two digits of the first pattern are the first two digits of the second pattern. Then you can cover the general case of "101", any 0s or 1s, "01010", as given by Oscar. 您需要覆盖101010的基本情况,其中两个模式重叠(以“ 101”开头,以“ 01010”结尾,第一个模式的最后两位是第二个模式的前两位。然后,您可以涵盖由Oscar给出的一般情况下的“ 101”,任何0或1,“ 01010”。

So the full pattern would be: 因此完整的模式将是:

^(101010|(101[01]*01010))$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM