简体   繁体   English

创建 DSL 表达式解析器/规则引擎

[英]Creating a DSL expressions parser / rules engine

I'm building an app which has a feature for embedding expressions/rules in a config yaml file.我正在构建一个应用程序,它具有在配置 yaml 文件中嵌入表达式/规则的功能。 So for example user can reference a variable defined in yaml file like ${variables.name == 'John'} or ${is_equal(variables.name, 'John')} .因此,例如,用户可以引用 yaml 文件中定义的变量,例如${variables.name == 'John'}${is_equal(variables.name, 'John')} I can probably get by with simple expressions but I want to support complex rules/expressions such ${variables.name == 'John'} and (${variables.age > 18} OR ${variables.adult == true})我可能可以使用简单的表达式,但我想支持复杂的规则/表达式,例如${variables.name == 'John'} and (${variables.age > 18} OR ${variables.adult == true})

I'm looking for a parsing/dsl/rules-engine library that can support these type of expressions and normalize it.我正在寻找一个可以支持这些类型的表达式并对其进行规范化的解析/dsl/rules-engine 库。 I'm open using ruby, javascript, java, or python if anyone knows of a library for that languages.我正在使用 ruby、javascript、java 或 python 或 python

One option I thought of was to just support javascript as conditons/rules and basically pass it through eval with the right context setup with access to variables and other reference-able vars.我想到的一个选择是仅支持 javascript 作为条件/规则,并基本上通过 eval 使用正确的上下文设置来访问变量和其他可引用的变量。

I don't know if you use Golang or not, but if you use it, I recommend this https://github.com/antonmedv/expr .我不知道你是否使用Golang,但如果你使用它,我推荐这个https://github.com/antonmedv/expr

I have used it for parsing bot strategy that (stock options bot).我用它来解析机器人策略(股票期权机器人)。 This is from my test unit:这是来自我的测试单元:

func TestPattern(t *testing.T) {
    a := "pattern('asdas asd 12dasd') && lastdigit(23asd) < sma(50) && sma(14) > sma(12) && ( macd(5,20) > macd_signal(12,26,9) || macd(5,20) <= macd_histogram(12,26,9) )"

    r, _ := regexp.Compile(`(\w+)(\s+)?[(]['\d.,\s\w]+[)]`)
    indicator := r.FindAllString(a, -1)
    t.Logf("%v\n", indicator)
    t.Logf("%v\n", len(indicator))

    for _, i := range indicator {
        t.Logf("%v\n", i)
        if strings.HasPrefix(i, "pattern") {
            r, _ = regexp.Compile(`pattern(\s+)?\('(.+)'\)`)
            check1 := r.ReplaceAllString(i, "$2")
            t.Logf("%v\n", check1)
            r, _ = regexp.Compile(`[^du]`)
            check2 := r.FindAllString(check1, -1)
            t.Logf("%v\n", len(check2))
        } else if strings.HasPrefix(i, "lastdigit") {
            r, _ = regexp.Compile(`lastdigit(\s+)?\((.+)\)`)
            args := r.ReplaceAllString(i, "$2")
            r, _ = regexp.Compile(`[^\d]`)
            parameter := r.FindAllString(args, -1)
            t.Logf("%v\n", parameter)
        } else {

        }
    }
}

Combine it with regex and you have good (if not great, string translator).将它与正则表达式结合起来,你就有了很好的(如果不是很好的话,字符串翻译器)。

And for Java, I personally use https://github.com/ridencww/expression-evaluator but not for production.而对于 Java,我个人使用https://github.com/ridencww/expression-evaluator但不用于生产。 It has similar feature with above link.它具有与上述链接类似的功能。

It supports many condition and you don't have to worry about Parentheses and Brackets.它支持许多条件,您不必担心括号和括号。

Assignment  =
Operators   + - * / DIV MOD % ^ 
Logical     < <= == != >= > AND OR NOT
Ternary     ? :  
Shift       << >>
Property    ${<id>}
DataSource  @<id>
Constants   NULL PI
Functions   CLEARGLOBAL, CLEARGLOBALS, DIM, GETGLOBAL, SETGLOBAL
            NOW PRECISION

Hope it helps.希望能帮助到你。

You might be surprised to see how far you can get with a syntax parser and 50 lines of code!您可能会惊讶地发现使用语法解析器和 50 行代码可以走多远!

Check this out . 看看这个 The Abstract Syntax Tree (AST) on the right represents the code on the left in nice data structures.右侧的抽象语法树 (AST) 以良好的数据结构表示左侧的代码。 You can use these data structures to write your own simple interpreter.您可以使用这些数据结构来编写自己的简单解释器。

I wrote a little example of one: https://codesandbox.io/s/nostalgic-tree-rpxlb?file=/src/index.js我写了一个小例子: https://codesandbox.io/s/nostalgic-tree-rpxlb?file=/src/index.js

Open up the console (button in the bottom), and you'll see the result of the expression!打开控制台(底部的按钮),您将看到表达式的结果!

This example can only handle (||) and (>), but looking at the code (line 24), you can see how you could make it support any other JS operator.此示例只能处理 (||) 和 (>),但查看代码(第 24 行),您可以看到如何使其支持任何其他 JS 运算符。 Just add a case to the branch, evaluate the sides, and do the calculation on JS.只需在分支中添加一个案例,评估边,然后在 JS 上进行计算。

Parenthesis and operator precedence are all handled by the parser for you.括号和运算符优先级都由解析器为您处理。

I'm not sure if this is the solution for you, but it will for sure be fun;)我不确定这是否适合您,但它肯定会很有趣;)

One option I thought of was to just support javascript as conditons/rules and basically pass it through eval with the right context setup with access to variables and other reference-able vars.我想到的一个选择是仅支持 javascript 作为条件/规则,并基本上通过 eval 使用正确的上下文设置来访问变量和其他可引用的变量。

I would personally lean towards something like this.我个人会倾向于这样的事情。 If you are getting into complexities such as logic comparisons, a DSL can become a beast since you are basically almost writing a compiler and a language at that point.如果您正在处理诸如逻辑比较之类的复杂性,那么 DSL 可能会变成一头野兽,因为那时您基本上几乎是在编写编译器和语言。 You might want to just not have a config, and instead have the configurable file just be JavaScript (or whatever language) that can then be evaluated and then loaded.您可能只想没有配置,而是让可配置文件只是 JavaScript (或任何语言),然后可以评估然后加载。 Then whoever your target audience is for this "config" file can just supplement logical expressions as needed.然后,无论您的目标受众是谁,这个“配置”文件都可以根据需要补充逻辑表达式。

The only reason I would not do this is if this configuration file was being exposed to the public or something, but in that case security for a parser would also be quite difficult.我不这样做的唯一原因是这个配置文件是否被公开或其他什么,但在这种情况下,解析器的安全性也将非常困难。

I'm building an app which has a feature for embedding expressions/rules in a config yaml file.我正在构建一个应用程序,它具有在配置 yaml 文件中嵌入表达式/规则的功能。

I'm looking for a parsing/dsl/rules-engine library that can support these type of expressions and normalize it.我正在寻找一个可以支持这些类型的表达式并对其进行规范化的解析/dsl/rules-engine 库。 I'm open using ruby, javascript, java, or python if anyone knows of a library for that languages.我正在使用 ruby、javascript、java 或 python 或 python

One possibility might be to embed a rule interpreter such as ClipsRules inside your application.一种可能是在应用程序中嵌入规则解释器,例如ClipsRules You could then code your application in C++ (perhaps inspired by my clips-rules-gcc project) and link to it some C++ YAML library such as yaml-cpp .然后,您可以在 C++ 中编码您的应用程序(可能受到我的clips-rules-gcc项目的启发)并链接到它一些 C++ YAML库,例如yaml-cpp

Another approach could be to embed some Python interpreter inside a rule interpreter (perhaps the same ClipsRules ) and some YAML library.另一种方法可能是将一些 Python 解释器嵌入规则解释器(可能是相同的ClipsRules )和一些 YAML 库中。

A third approach could be to use Guile (or SBCL or Javascript v8 ) and extend it with some "expert system shell".第三种方法可能是使用Guile (或SBCLJavascript v8 )并使用一些“专家系统外壳”对其进行扩展。

Before starting to code, be sure to read several books such as the Dragon Book , the Garbage Collection handbook , Lisp In Small Pieces , Programming Language Pragmatics .在开始编码之前,一定要阅读好几本书,例如Dragon Book垃圾收集手册Lisp In Small PiecesProgramming Language Pragmatics Be aware of various parser generators such as ANTLR or GNU bison , and of JIT compilation libraries like libgccjit or asmjit .请注意各种解析器生成器,例如ANTLRGNU bison ,以及 JIT 编译库,例如libgccjitasmjit

You might need to contact a lawyer about legal compatibility of various open source licenses.您可能需要联系律师了解各种开源许可证的法律兼容性。

Some toughs and things you should consider.一些困难和你应该考虑的事情。

1. Unified Expression Language (EL), 1.统一表达语言(EL),

Another option is EL, specified as part of the JSP 2.1 standard ( JSR-245 ).另一个选项是 EL,指定为 JSP 2.1 标准 ( JSR-245 ) 的一部分。 Official documentation .官方文档

They have some nice examples that can give you a good overview of the syntax.他们有一些很好的 例子可以让你很好地了解语法。 For example:例如:

   El Expression: `${100.0 == 100}` Result=  `true`   
   El Expression: `${4 > 3}`        Result=  `true` 

You can use this to evaluate small script-like expressions.您可以使用它来评估类似脚本的小型表达式。 And there are some implementations: Juel is one open source implementation of the EL language.还有一些实现: Juel是 EL 语言的一种开源实现。

2. Audience and Security 2. 观众和安全

All the answers recommend using different interpreters, parser generators.所有答案都建议使用不同的解释器、解析器生成器。 And all are valid ways to add functionality to process complex data.所有这些都是添加功能以处理复杂数据的有效方法。 But I would like to add an important note here.但我想在这里添加一个重要说明。

Every interpreter has a parser, and injection attacks target those parsers, tricking them to interpret data as commands.每个解释器都有一个解析器,注入攻击以这些解析器为目标,诱使它们将数据解释为命令。 You should have a clear understanding how the interpreter's parser works, because that's the key to reduce the chances to have a successful injection attack Real world parsers have many corner cases and flaws that may not match the specs.您应该清楚地了解解释器的解析器是如何工作的,因为这是减少成功注入攻击机会的关键现实世界的解析器有许多可能与规范不匹配的极端情况和缺陷。 And have clear the measures to mitigate possible flaws.并有明确的措施来减轻可能的缺陷。

And even if your application is not facing the public.即使您的应用程序不面向公众。 You can have external or internal actors that can abuse this feature.您可以拥有可以滥用此功能的外部或内部参与者。

I did something like that once, you can probably pick it up and adapt it to your needs.我曾经做过类似的事情,您可能可以拿起它并根据您的需要进行调整。

TL;DR: thanks to Python's eval , you doing this is a breeze. TL;DR:感谢 Python 的eval ,您可以轻而易举地做到这一点。

The problem was to parse dates and durations in textual form.问题是以文本形式解析日期和持续时间。 What I did was to create a yaml file mapping regex pattern to the result.我所做的是创建一个 yaml 文件将正则表达式模式映射到结果。 The mapping itself was a python expression that would be evaluated with the match object, and had access to other functions and variables defined elsewhere in the file.映射本身是一个 python 表达式,将使用匹配 object 进行评估,并且可以访问文件中其他位置定义的其他函数和变量。

For example, the following self-contained snippet would recognize times like "l'11 agosto del 1993" (Italian for "August 11th, 1993,).例如,以下自包含片段将识别像“l'11 agosto del 1993”(意大利语为“August 11th, 1993”)这样的时间。

__meta_vars__:
  month: (gennaio|febbraio|marzo|aprile|maggio|giugno|luglio|agosto|settembre|ottobre|novembre|dicembre)
  prep_art: (il\s|l\s?'\s?|nel\s|nell\s?'\s?|del\s|dell\s?'\s?)
  schema:
    date: http://www.w3.org/2001/XMLSchema#date

__meta_func__:
  - >
    def month_to_num(month):
        """ gennaio -> 1, febbraio -> 2, ..., dicembre -> 12 """
        try:
            return index_in_or(meta_vars['month'], month) + 1
        except ValueError:
            return month

Tempo:
  - \b{prep_art}(?P<day>\d{{1,2}}) (?P<month>{month}) {prep_art}?\s*(?P<year>\d{{4}}): >
      '"{}-{:02d}-{:02d}"^^<{schema}>'.format(match.group('year'),
                                              month_to_num(match.group('month')),
                                              int(match.group('day')),
                                              schema=schema['date'])

__meta_func__ and __meta_vars (not the best names, I know) define functions and variables that are accessible to the match transformation rules. __meta_func____meta_vars (不是最好的名字,我知道)定义了匹配转换规则可以访问的函数和变量。 To make the rules easier to write, the pattern is formatted by using the meta-variables, so that {month} is replaced with the pattern matching all months.为了使规则更容易编写,使用元变量对模式进行格式化,以便将{month}替换为匹配所有月份的模式。 The transformation rule calls the meta-function month_to_num to convert the month to a number from 1 to 12, and reads from the schema meta-variable.转换规则调用元函数month_to_num将月份转换为 1 到 12 的数字,并从schema元变量中读取。 On the example above, the match results in the string "1993-08-11"^^<http://www.w3.org/2001/XMLSchema#date> , but some other rules would produce a dictionary.在上面的示例中,匹配结果为字符串"1993-08-11"^^<http://www.w3.org/2001/XMLSchema#date> ,但其他一些规则会生成字典。

Doing this is quite easy in Python, as you can use exec to evaluate strings as Python code (obligatory warning about security implications).在 Python 中执行此操作非常容易,因为您可以使用exec将字符串评估为 Python 代码(有关安全隐患的强制性警告)。 The meta-functions and meta-variables are evaluated and stored in a dictionary, which is then passed to the match transformation rules.元函数和元变量被评估并存储在字典中,然后传递给匹配转换规则。

The code is on github , feel free to ask any questions if you need clarifications.代码在 github 上,如果您需要澄清,请随时提出任何问题。 Relevant parts, slightly edited:相关部分,稍作编辑:

class DateNormalizer:
    def _meta_init(self, specs):
        """ Reads the meta variables and the meta functions from the specification
        :param dict specs: The specifications loaded from the file
        :return: None
        """
        self.meta_vars = specs.pop('__meta_vars__')

        # compile meta functions in a dictionary
        self.meta_funcs = {}
        for f in specs.pop('__meta_funcs__'):
            exec f in self.meta_funcs

        # make meta variables available to the meta functions just defined
        self.meta_funcs['__builtins__']['meta_vars'] = self.meta_vars

        self.globals = self.meta_funcs
        self.globals.update(self.meta_vars)

    def normalize(self, expression):
        """ Find the first matching part in the given expression
        :param str expression: The expression in which to search the match
        :return: Tuple with (start, end), category, result
        :rtype: tuple
        """
        expression = expression.lower()
        for category, regexes in self.regexes.iteritems():
            for regex, transform in regexes:
                match = regex.search(expression)
                if match:
                    result = eval(transform, self.globals, {'match': match})
                    start, end = match.span()
                    return (first_position + start, first_position + end) , category, result

Here are some categorized Ruby options and resources:以下是一些分类的 Ruby 选项和资源:

Insecure不安全

  1. Pass expression to eval in the language of your choice.以您选择的语言将表达式传递给eval

It must be mentioned that eval is technically an option, but extraordinary trust must exist in its inputs and it is safer to avoid it altogether.必须提到eval在技术上是一种选择,但在其输入中必须存在非凡的信任,并且完全避免它更安全。

Heavyweight重量级

  1. Write a parser for your expressions and an interpreter to evaluate them为您的表达式编写一个解析器和一个解释器来评估它们

A cost-intensive solution would be implementing your own expression language.一个成本密集型的解决方案是实现您自己的表达语言。 That is, to design a lexicon for your expression language, implement a parser for it, and an interpreter to execute the code that's parsed.也就是说,要为您的表达式语言设计一个词典,为它实现一个解析器,以及一个解释器来执行被解析的代码。

Some Parsing Options (ruby)一些解析选项(红宝石)

Medium Weight中等重量

  1. Pick an existing language to write expressions in and parse / interpret those expressions.选择一种现有的语言来编写表达式并解析/解释这些表达式。

This route assumes you can pick a known language to write your expressions in. The benefit is that a parser likely already exists for that language to turn it into an Abstract Syntax Tree (data structure that can be walked for interpretation).这条路线假设您可以选择一种已知的语言来编写您的表达式。好处是该语言可能已经存在解析器,可以将其转换为抽象语法树(可以遍历以进行解释的数据结构)。

A ruby example with the Parser gem带有Parser gem 的 ruby 示例

require 'parser'

class MyInterpreter
  # https://whitequark.github.io/ast/AST/Processor/Mixin.html
  include ::Parser::AST::Processor::Mixin

  def on_str(node)
    node.children.first
  end

  def on_int(node)
    node.children.first.to_i
  end

  def on_if(node)
    expression, truthy, falsey = *node.children
    if process(expression)
      process(truthy)
    else
      process(falsey)
    end
  end

  def on_true(_node)
    true
  end

  def on_false(_node)
    false
  end

  def on_lvar(node)
    # lookup a variable by name=node.children.first
  end

  def on_send(node, &block)
    # allow things like ==, string methods? whatever
  end

  # ... etc
end

ast = Parser::ConcurrentRuby.parse(<<~RUBY)
  name == 'John' && adult
RUBY
MyParser.new.process(ast)
# => true

The benefit here is that a parser and syntax is predetermined and you can interpret only what you need to (and prevent malicious code from executing by controller what on_send and on_const allow).这里的好处是解析器和语法是预先确定的,您可以只解释您需要的内容(并防止恶意代码被 controller 执行, on_sendon_const允许)。

Templating模板

This is more markup-oriented and possibly doesn't apply, but you could find some use in a templating library, which parses expressions and evaluates for you.这更面向标记并且可能不适用,但您可以在模板库中找到一些用途,该库为您解析表达式并进行评估。 Control and supplying variables to the expressions would be possible depending on the library you use for this.根据您为此使用的库,可以控制并为表达式提供变量。 The output of the expression could be checked for truthiness.可以检查表达式的 output 的真实性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM