简体   繁体   中英

If NFA and DFA were equivalent, why would we have two engines for regex?

From Mastering Regular Expressions 3e :

As a result, broadly speaking, there are three types of regex engines:

  • DFA (POSIX or not—similar either way)
  • Traditional NFA (most common: Perl, .NET, PHP, Java, Python, . . . )
  • POSIX NFA

From the Theory of Computation: Formal Languages, Automata, and Complexity :

For each NFA, there is a DFA that accepts exactly the same language.

Can I argue that NFA and DFA are the same thing? Or even though their ability to recognize patterns are equivalent, but they are still different in some way?

There are two things you're missing:

  1. The "traditional NFA" implementations actually include abilities beyond the strict computer science definition of an NFA.

  2. Performance characteristics are a thing to care about, even given two implementations that cover the same answer.

The net effect is that the backtracking implementations (I like that name better than "traditional NFA") are slightly more expressive than DFA implementations because they can match regexes like (\\w{3,})\\1 , which matches three or more word characters repeated twice (something that can't be matched by a DFA). At the same time, DFA implementations are guaranteed O(n) in input length, but it is very easy to write regexes that have O(n^2) or worse behavior when presented with a string they don't match. (See https://swtch.com/~rsc/regexp/regexp1.html )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM