简体   繁体   English

怎么快速转换为Chomsky Normal Form?

[英]How to convert to Chomsky Normal Form quickly?

So I know the basic 4 step way of doing it. 所以我知道基本的4步方法。 Removing the epsilons, then the variables less than 2 and so on. 删除ε,然后变量小于2,依此类推。 However, that way takes way too long for the problems we will have to do on the test. 但是,对于我们在测试中必须解决的问题,这种方式花费的时间太长。

Here is an example: 这是一个例子:
Convert this context-free grammar to an equivalent Chomsky normal form grammar. 将此无上下文语法转换为等效的Chomsky范式语法。 Remember to remove all useless symbols from the grammar. 切记从语法中删除所有无用的符号。

S → TaXU | S→TaXU | STUVWXY 苏威
T → UU | T→UU | abc abc
U → bSc | U→bSc | ε ε
V → aV | V→aV | Wb b
W → cW | W→cW | Va 弗吉尼亚州
X → bT | X→bT | Tc c
Y → cba | Y→cba | ε ε

This is 1 of 6 questions on the test. 这是测试中的6个问题之一。 We have 50 minutes to complete the entire test. 我们有50分钟的时间来完成整个测试。 I can do this one, but it takes me like 30-40 minutes each time. 我可以做到这一点,但是每次需要30-40分钟。 Does anyone know any tricks or shortcuts to make something like this go quicker? 有谁知道任何技巧或捷径可以使这种事情变得更快?

Thanks. 谢谢。

There are a few strategies you might use to reduce the time to get to an answer. 您可以使用一些策略来减少获得答案的时间。 First, you might try to understand directly the language of the grammar, and rather than converting the given grammar, writing down a new grammar in CNF for the same language. 首先,您可能会尝试直接理解语法的语言,而不是转换给定的语法,而是在CNF中为相同的语言写下新的语法。 I don't know that this is particularly useful for the example given, but it might be useful, particularly if - on a test - you recognize a grammar as corresponding to a known language, or if the language is named or otherwise described. 我不知道这对于给出的示例是否特别有用,但是它可能很有用,尤其是在-在测试中-您识别出一种语法与一种已知语言相对应,或者该语言是否被命名或以其他方式描述时。

Otherwise, I would look for ways to simplify the grammar before attempting to remove ε. 否则,我将尝试尝试删除ε之前简化语法的方法。 That step increases the size of your grammar, and you want that to be done as late as possible to avoid having to think about a bigger grammar earlier on. 该步骤增加了语法的大小,并且您希望尽可能晚地执行该操作,以避免早先考虑更大的语法。 This has some nontrivial payoff in this example. 在此示例中,这有一些不小的收益。

First, see whether every symbol leads to a string in the language. 首先,查看每个符号是否都导致该语言中的字符串。 This should be pretty quick. 这应该很快。 Eliminate any symbols which do not, as they are not even potentially useful. 消除所有没有用的符号,因为它们甚至没有潜在的用处。

  1. If a nonterminal leads to a string of terminals, it is potentially useful. 如果非终端导致一连串终端,则可能很有用。
  2. If a nonterminal leads to a string of terminals and potentially useful nonterminals, it is potentially useful. 如果非终端导致一连串的终端和可能有用的非终端,则它可能是有用的。

For your grammar: 对于您的语法:

  • Y -> cba so Y is potentially useful Y -> cba因此Y可能有用
  • X -> bT -> babc so T is potentially useful X -> bT -> babc所以T可能有用
  • W and V do not lead anywhere useful; WV不会导致有用的地方; they only derive strings including themselves or each other, and neither one has proven to be potentially useful. 它们仅派生包括自身或彼此的字符串,并且没有一个被证明潜在有用。 These symbols, and any productions containing them , can be immediately discarded. 这些符号以及包含它们的任何产品都可以立即丢弃。
  • U -> e so U is potentially useful. U -> e因此U可能有用。
  • T -> abc so T is potentially useful T -> abc因此T可能有用
  • S -> TaXU -> abcaXU -> abcabTU -> abcababcU -> abcababc so S is potentially useful (too bad!) S -> TaXU -> abcaXU -> abcabTU -> abcababcU -> abcababc因此S可能有用(太糟糕了!)

This already gained us a lot. 这已经使我们受益匪浅。 Consider the new grammar: 考虑新的语法:

S → TaXU
T → UU | abc 
U → bSc | ε 
X → bT | Tc 
Y → cba | ε

Next, look for any nonterminal symbols other than S that don't appear in a remaining production. 接下来,查找未在剩余生产中出现的除S之外的所有非终结符。 We can quickly see that Y is not reachable from S in this new grammar and can remove it to get: 我们可以很快地看到,在这个新语法中, Y无法从S到达,可以将其删除以得到:

S → TaXU
T → UU | abc 
U → bSc | ε 
X → bT | Tc

It might be possible that the above steps could be usefully repeated to continue removing useless nonterminal symbols and productions. 可能可以有效地重复执行上述步骤,以继续删除无用的非终结符和符号。 I think everything remaining is useful. 我认为剩下的一切都是有用的。

Now we can eliminate ε . 现在我们可以消除ε The standard way of doing this is by adding productions that use ε in place of U . 执行此操作的标准方法是添加使用ε代替U We have two productions that use U and three new productions to add. 我们有两个使用U的作品和三个新作品添加。 Our new grammar looks like this: 我们的新语法如下所示:

S → TaXU | TaX
T → UU | U | abc | ε 
U → bSc
X → bT | Tc

Repeating for T : 重复T

S → TaXU | aXU | TaX | aX
T → UU | U | abc
U → bSc
X → bT | Tc | b | c

We only have one production that take one nonterminal to another, and we can eliminate that by substitution: 我们只有一种生产将一个非末端带到另一个非末端,我们可以通过替换消除它:

S → TaXU | aXU | TaX | aX
T → UU | bSc | abc
U → bSc
X → bT | Tc | b | c

Now, what's left shouldn't take too long - we're past the hardest part, which is eliminating empty productions and productions deriving a single nonterminal. 现在,剩下的时间不应该太长-我们已经走过了最困难的部分,即消除空的生产和衍生出单个非终端的生产。 Now, we just need to introduce productions for terminal symbols and for strings of terminals and nonterminals. 现在,我们只需要介绍终端符号以及终端和非终端字符串的产品即可。 I recommend you start with shorter strings and work your way up. 我建议您从较短的字符串开始,然后逐步提高。

We see some terminal symbols appearing alongside nonterminals, which cannot be, so you can always just add new nonterminals for them: 我们看到一些终端符号出现在非终端符号旁边,而不能出现,所以您总是可以为它们添加新的非终端符号:

S → TAXU | AXU | TAX | AX
T → UU | BSC | ABC
U → BSC
X → BT | TC | b | c
A → a
B → b
C → c

Now, starting with the shortest strings (length > 2), add new symbols that derive two at a time. 现在,从最短的字符串(长度> 2)开始,添加新的符号,这些符号一次派生两个。 To save time, just work left to right. 为了节省时间,只需从左到右工作。 We see BSC , ABC , TAX , AXU and TAXU . 我们看到BSCABCTAXAXUTAXU We can add G → BS , H -> AB , I → TA , J → XU and get: 我们可以添加G → BSH -> ABI → TAJ → XU并得到:

S → IJ | AJ | IX | AX
T → UU | GC | HC
U → GC
G → BS
H → AB
I → TA
J → XU
X → BT | TC | b | c
A → a
B → b
C → c

Now, did this take then 8 minutes you'd have to do the problem (6 problems, 50 minutes) in a testing situation? 现在,这是否需要8分钟才能在测试情况下解决问题(6个问题,共50分钟)? That's a bit tight. 有点紧 It certainly took me longer to type the explanation. 当然,我花了更长的时间来键入解释。 Crossing out symbols and productions should be quick, but adding productions means writing them down, which takes some time. 划掉符号和作品应该很快,但是增加作品意味着将它们写下来,这需要一些时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM