简体   繁体   English

如何从BNF生成随机程序

[英]How to generate random programs from BNF

I know my question sounds a little vague, but I could not find any tutorials online. 我知道我的问题听起来有点模糊,但我在网上找不到任何教程。 I am not asking for an answer, but for more of an explanation. 我不是要求答案,而是要求更多解释。 An example of the BNF: BNF的一个例子:

<prog> ::= “int main() { <stat_list> return 0; }”
<stat_list>  ::= <stat>
         | <stat_list> <stat>
<stat>       ::= <cmpd_stat>
         | <if_stat>
         | <iter_stat>
         | <assgn_stat>
         | <decl_stat>
<cmpd_stat>  ::= { <stat_list> }
<if_stat>    ::= if ( <exp> ) <stat>
         | if ( <exp> ) <cmpd_stat>
         | if ( <exp> ) <stat> else <stat>
         | if ( <exp> ) <cmpd_stat> else <stat>
         | if ( <exp> ) <stat> else <cmpd_stat>
         | if ( <exp> ) <cmpd_stat> else <cmpd_stat>

What would be the easiest way to convert this into python to have my program create a random program using the conditions above? 将此转换为python以使我的程序使用上述条件创建随机程序的最简单方法是什么? Any help of links to useful websites would be greatly appreciated. 任何有用网站链接的帮助将不胜感激。

NLTK has a package for grammars . NLTK有一个语法包。 Normally used for sentences analysis, but nothing stops you to using it for create a "program" following that rules. 通常用于句子分析,但没有什么能阻止你使用它来创建遵循该规则的“程序”。

I think NLTK only let you define a Context Free Grammar, so I'm leaving you here a little example I did: 我认为NLTK只允许你定义一个Context Free Grammar,所以我把这个例子留给你:

from nltk import CFG
from nltk.parse.generate import generate

#Define your grammar from string
#You can define it using other methods, but I only know this xD

grammar = CFG.fromstring(""" S -> NP VP
  VP -> V NP
  V -> "mata" | "roba"
  NP -> Det N | NP NP
  Det -> "un" | "el" | "con" | "a" | "una"
  N -> "bebé" | "ladrón" | "Obama" | "perrete" | "navastola" | "navaja" | "pistola" """)

''' This grammar creates sentences like:
        El bebé roba a Obama
        Baby steals Obama (in spanish)
'''
#With this we "create" all the possible combinations
grammar.productions()

#Here you can see all the productions (sentences) with 5 words
#created with this grammar
for production in generate(grammar, depth=5):
    print(' '.join(production))

You can do this by abusing a parser to turn it into a generator. 您可以通过滥用解析器将其转换为生成器来实现此目的。

First, build a recursive parser for your language. 首先,为您的语言构建一个递归解析器。 ( See my SO answer on how to do just that ). 请参阅我的答案,了解如何做到这一点 )。 pause while you read that .... I now assume you understand how to do that. 你读的时候会停顿 ......我现在假设你明白怎么做。

You'll note that such a parser is full of calls from the parser function for one grammar rule, to other functions for other grammar rules or primitive token matchers. 您会注意到,这样的解析器充满了来自解析器函数的一个语法规则的调用,以及其他语法规则或原始令牌匹配器的其他函数。

What you want to do is modify each call to decide that it will return "true" with some low probability if there is some alternative still available in the function, before the call is made. 您要做的是修改每个调用,以确定在调用之前,如果函数中仍有一些替代方法,它将以低概率返回“true”。 If a call decides on false, control simply passes to another part of the parser. 如果调用决定为false,则控件只会传递给解析器的另一部分。 if a call decides true, it actually makes the call; 如果一个电话决定为真,它实际上是通话; the callee must now act in a way that will return true and generate corresponding source code. 被调用者现在必须以一种返回true并生成相应源代码的方式行事。 At some point, this will force a call to a token reader to return true; 在某些时候,这将强制调用令牌读取器返回true; the token reader gets replaced by a print function that emits a random token. 令牌读取器被发出随机令牌的打印功能所取代。 What actually happens when you do this is that calls to decide if something is true now simply become calls; 当你这样做时实际发生的事情是,决定某事是否真实的电话现在只是变成电话; we no longer need a return status because the called function must return true. 我们不再需要返回状态,因为被调用的函数必须返回true。 This changes our functions-returning-bools into procedures-returning-void. 这会将我们的函数 - returns-bools更改为procedures-returning-void。 See the example below.. 请参阅下面的示例..

Let's try an example with this simple grammar for a simple programming language p : 让我们用简单的语法为一个简单的编程语言p尝试一个例子:

p = s ;
s = v '=' e ;
s = 'if' e 'then' s ;
e = v ;
e = v '+' n ;

OK, our recursive descent parser for p (I'm not a Python guy, so this is psuedocode): 好的,我们的p的递归下降解析器(我不是Python人,所以这是伪代码):

function p() { return s(); } // no alternatives
function s() { if v()
               then if match("=")
                    then return e()
                    else return false;
               else if match("if")
                    then if e()
                         then if match("then")
                              then return s()
                              else return false;
                         else return false;
                    else return false;
              }
 function e() { if v()
                then if match ("+")
                     then if n()
                     else return true
                else return false
              }
 function v() { return match_variable_name(); }
 function n() { return match_integer_constant(); }

OK, now lets force the calls to decide if they are going to succeed using a coin flip function that randomly returns true or false. 好了,现在让我们强制调用使用随机返回true或false的硬币翻转函数来判断它们是否会成功。 Any construct of the form: 任何形式的构造:

          if <testsomething> then <action x> else <action y>

gets turned into: 变成了:

          if flip() then  { <testsomething> <action x> } else <action y>

and any construct of the form: 以及任何形式的构造:

          if  <testsomething> then <action x> else return false

gets turned into 变成了

          { <testsomething>; <action x> }

because it must succeed if we are to generate a parsable programs. 因为如果要生成可分析的程序,它必须成功。

If testsomething is a function-call to another grammer rule, we leave it alone. 如果testsomething是函数调用另一个语法规则,我们就不管它了。 Function calls to primitive token matches get turned into print statements: if testsomething is "match(Q)", then replace it by "print(Q)"; 对原始令牌匹配的函数调用变为print语句:如果testsomething为“match(Q)”,则将其替换为“print(Q)”; this is what actually generates a piece of the program. 这是实际生成程序的一部分。

procedure p() { s(); } // no choice, this has to succeed
procedure s() { if flip() // flip == true --> v must succeed
               then { v();
                      print("=") // because if no match, procedure fails
                      e();
                    }
               else { print("if")  // if we get here, must succeed
                      e();
                      print("then"); // because match("then") must succeed
                      s();
                    }
              }
 procedure e() { v(); // because there are no alternatives
                 if flip() then { print("+");
                                  n();
                                }
                 else { }
               }
 procedure v() { print_variable_name(); }
 procedure n() { print_integer_constant(); }

Note that the token recognizers for variable name and integer constants, now become print procedures that print random variable names/constants. 请注意,变量名和整数常量的令牌识别器现在成为打印随机变量名/常量的打印过程。 This is essentially just pushing "flip" into those procedures, too. 这基本上只是将“翻转”推入这些程序中。

Now this may print arbitrarily long programs because flip may force s to call itself repeatedly. 现在这可能会打印任意长的程序,因为翻转可能会强制s重复调用自身。 If flip is 50-50, your chances of 10 recursions in 1 in a 1000 so probably ok. 如果翻转是50-50,你有可能在1000中的1次递归,所以可能没问题。 However, you might decide to bias each individual flip to choose the shorter phrase, based on the size of the output generated so far, or the depth of any recursion. 但是,您可能会根据到目前为止生成的输出的大小或任何递归的深度来决定偏置每个单独的翻转以选择较短的短语。

Now, what this won't do in the general case produce semantically correct programs. 现在,在一般情况下这不会产生语义正确的程序。 That's because our parser is "context free"; 那是因为我们的解析器是“无上下文”; there are no constraints on one part of the generated code forced by other parts. 对其他部分强制生成的代码的一部分没有约束。 As an example, if your language had to declare a variable before using it, this scheme doesn't guarantee that a declaration for random-var-X will be produced before randome-var-X appears in an expression. 例如,如果您的语言在使用之前必须声明变量,则此方案不保证在表达式中出现randome-var-X之前将生成random-var-X的声明。

There's no easy way to fix this, because language semantics aren't gauranteed to be "easy". 没有简单的方法来解决这个问题,因为语言语义不是“容易”的。 Just goes to show that parsing a program ("technically easy") and checking for correct semantics ("arbitrarily hard", consider C++), leads to any equally hard problem of generating random program that doesn't violate langauge semantics. 只是表明解析一个程序(“技术上容易”)并检查正确的语义(“任意硬”,考虑C ++),导致产生不违反语言语义的随机程序的任何同样难的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM