简体   繁体   English

关于如何处理格式为JSON之类的AST的指南

[英]Guide on how to process AST formatted as JSON like structure

With use SQL::Abstract::Tree in Perl, I am able to generate an AST for SQL by: 通过在Perl中use SQL::Abstract::Tree ,我可以通过以下方式为SQL生成AST:

my $sqlat = SQL::Abstract::Tree->new;
my $tree = $sqlat->parse($query_str);

where $query_str is an SQL query. 其中$query_str是一个SQL查询。

As an example, with the query string SELECT cust_id, a as A, z SUM(price) as q, from orders WHERE status > 55 , produces: 例如,使用查询字符串SELECT cust_id, a as A, z SUM(price) as q, from orders WHERE status > 55 ,生成:

[
  [
    "SELECT",
    [
      [
        "-LIST",
        [
          ["-LITERAL", ["cust_id"]],
          ["AS", [["-LITERAL", ["a"]], ["-LITERAL", ["A"]]]],
          [
            "AS",
            [
              ["-LITERAL", ["z"]],
              ["SUM", [["-PAREN", [["-LITERAL", ["price"]]]]]],
              ["-LITERAL", ["q"]],
            ],
          ],
          [],
        ],
      ],
    ],
  ],
  ["FROM", [["-LITERAL", ["orders"]]]],
  [
    "WHERE",
    [[">", [["-LITERAL", ["status"]], ["-LITERAL", [55]]]]],
  ],
]

I would like to walk the AST and derive certain information about it. 我想了解一下AST并获取有关它的某些信息。

I would like to know if there is a guide/tutorial/example source code that walks an AST in this type of format. 我想知道是否有一种指南/教程/示例源代码以这种格式传递AST。

Most of the literature I have found considering walking AST's usually assumes I have some sort of class hierarchy describing some sort of variation of the visitor pattern to walk an AST. 我发现考虑步行AST的大多数文献通常都假设我具有某种类层次结构,该类层次结构描述了步行AST的访客模式的某种变化。

My specific use case is converting simple SQL queries to Mongo Queries for the aggregation framework, with some examples given here . 我的特定用例是将简单SQL查询转换为聚合框架的Mongo查询,并在此处给出一些示例。

Here is what I have been doing so far: 到目前为止,这是我一直在做的事情:

I first call a parse function with the tree dispatches on each subtree given its type and (which is the first parameter in each subtree,) and calls it with the rest of tree. 我首先调用一个parse函数,并在给定每个子树的类型和类型(这是每个子树中的第一个参数)的基础上对树进行调度,然后将其与树的其余部分一起调用。 Here is my parse function: 这是我的parse函数:

sub parse {
    my ($tree) = @_;

    my %results = (ret => []);
    for my $subtree (@$tree) {
        my ($node_type, $node) = @$subtree;

        my $result_dic = $dispatch{$node_type}->($node);
        if ($result_dic->{type}) {
             my $type = $result_dic->{type};
             $results{$type} = [] unless $results{$type};
             push $results{$type}, $result_dic->{ret};
             %results = merge_except_for($result_dic, \%results, 'ret', $type);
         }
         else {
             push @{$results{ret}}, @{$result_dic->{ret}};
         }

    }


    return \%results;

}

Which uses the following dispatch table: 它使用以下调度表:

my %dispatch = (
    SELECT => sub {

        my $node = shift;
        my $result_dic = parse($node);
        $result_dic->{type} = 'select';
        if ($result_dic->{as}) {
             push $result_dic->{ret}, $result_dic->{as}->[0][0];
         }
        return $result_dic;
    },
    '-LITERAL' => sub {
        my $node = shift;
        my $literal = $node;
        return {ret => $node};
    },
    '-LIST' => sub {
        my $node = shift;
        my $result_dic = parse($node);

        my $ret = flatten_ret($result_dic);

        return flatten_ret($result_dic);
    },
    WHERE => sub {
        my $tree = shift;
        my @bin_ops = qw/= <= < >= >/;

        my $op = $tree->[0];
        if ($op ~~ @bin_ops) {
            # Not yet implemented
        }
        return {ret => ''};

    },
    FROM => sub {
        my $tree = shift;
        my $parse_result = parse($tree);
        return {ret => $parse_result->{ret},
                type => 'database'};
    },
    AS => sub {
        my $node = shift;

        my $result_dic = parse($node);
        $result_dic->{type} = 'as';
        return $result_dic;
    }
);

sub flatten_ret {
    my $result_dic = shift;

    return {ret => [
        map {
            ref($_) ? $_->[0] : $_
        } @{$result_dic->{ret}}]};
}

But I'm not sure about certain things, like if I should be checking if the node name is "AS" in the SELECT subroutine or finding a way to recurse to fill in the data. 但是我不确定某些事情,例如我是否应该检查SELECT子例程中的节点名称是否为"AS" ,还是寻找一种递归填充数据的方法。

Also, what type of data should be returned from each dispatch call and how can I combine it at the end? 另外,每个调度调用应返回哪种类型的数据,最后如何合并?

Also, I am new to AST processing and looking to get a grip on it, so advice on how I could improve my question would also be appreciated. 另外,我是AST处理的新手,并且希望能掌握它,因此,对于如何改善问题的建议也将不胜感激。

Your idea to do typed dispatch is roughly correct. 您进行类型分派的想法大致正确。 Usually one might use objects and dispatch methods on them. 通常,人们可能会在其上使用对象并使用分派方法。 But using a two-element list to tag data with some type works as well. 但是,使用两个元素的列表标记某种类型的数据也可以。 Your misnomed parse function implements this dispatch, and somehow aggregates the output. 您错误的parse函数将实现此分派,并以某种方式聚合输出。 I am not quite sure what you are trying to achieve with that. 我不太确定您要达到的目标。

When doing AST transforms it is very useful to keep in mind what exact output you want to create. 进行AST转换时,记住要创建的确切输出非常有用。 Let's assume you want to transform 假设您要转换

SELECT cust_id, a as A, SUM(price) as q from orders WHERE status > 55

into the data structure 进入数据结构

{
  table  => 'orders',
  action => 'aggregate',
  query  => [
    '$match' => { 'status' => { '$gt' => 55 } },
    '$group' => {
       '_id'     => undef,
       'cust_id' => '$cust_id',
       'A'       => '$a',
       'q'       => { '$sum' => '$price' },
    },
  ],
}

What do we have to do for that? 为此我们需要做什么?

  • Assert that we have a SELECT ... FROM ... type query. 断言我们有一个SELECT ... FROM ...类型查询。
  • Set the action to aggregate . 将操作设置为aggregate
  • Extract the table name of the FROM entry 提取FROM条目的表名
  • Assemble the query: 组装查询:
    • For each SELECT item, get the name, and the expression that produces this value. 对于每个SELECT项,获取名称以及产生该值的表达式。
      • Build each expression recursively 递归构建每个表达式
    • If a WHERE clause is present, translate each condition recursively. 如果存在WHERE子句,则递归转换每个条件。

If we encounter syntax which we cannot parse, throw an error. 如果遇到无法解析的语法,则会引发错误。

Note that my approach starts from the top, and extracts information from deeper in the AST when we need it. 请注意,我的方法从顶部开始,并在需要时从AST的更深层提取信息。 This is in contrast to your bottom-up approach that munges all data together and hopes something relevant remains at the end. 这与您自下而上的方法相反,后者将所有数据汇总在一起,并希望最后有相关内容。 Especially your hash merging looks dubious. 尤其是您的哈希合并看起来令人怀疑。

How can this be implemented? 如何实现呢? Here is a start: 这是一个开始:

use Carp;

sub translate_select_statement {
  my ($select, $from, @other_clauses) = @_;
  $select->[0] eq 'SELECT'
    or croak "First clause must be a SELECT clause, not $select->[0]";
  $from->[0] eq 'FROM'
    or croak "Second clause must be a FROM clause, not $from->[0]";

  my $select_list = $select->[1];
  my %groups = (
    _id => undef,
    translate_select_list(get_list_items($select_list)),
  );

  ...
}

sub get_list_items {
  my ($list) = @_;
  if ($list->[0] eq '-LIST') {
    return @{ $list->[1] };
  }
  else {
    # so it's probably just a single item
    return $list;
  }
};

sub translate_select_list {
  my %out;
  for my $item (@_) {
    my ($type, $data) = @$item;
    if ($type eq '-LITERAL') {
      my ($name) = @$data;
      $out{$name} = '$' . $name;
    }
    elsif ($type eq '-AS') {
      my ($expr, $name_literal) = @$data;
      $name_literal->[0] eq '-LITERAL'
        or croak "in 'x AS y' expression, y must be a literal, but it was $name_literal->[0]";
      $out{$name_literal->[1][0]} = translate_expression($expr);
    }
    else {
      croak "I select list, items must be literals or 'x AS y' expression. Found [$type, $data] instead.";
    }
  }
  return %out;
}

sub translate_expression { ... }

The way I structured this, it is much more like a top-down parser, but eg for the translation of arithmetic expression, type dispatch is more important. 我的构造方式更像是自上而下的解析器,但是例如对于算术表达式的翻译,类型分配更为重要。 In the above code, if / else cases are better, because they allow for more validation. 在上面的代码中, if / else情况更好,因为它们允许更多的验证。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM