[英]Guide on how to process AST formatted as JSON like structure
With use SQL::Abstract::Tree
in Perl, I am able to generate an AST for SQL by: 通过在Perl中
use SQL::Abstract::Tree
,我可以通过以下方式为SQL生成AST:
my $sqlat = SQL::Abstract::Tree->new;
my $tree = $sqlat->parse($query_str);
where $query_str
is an SQL query. 其中
$query_str
是一个SQL查询。
As an example, with the query string SELECT cust_id, a as A, z SUM(price) as q, from orders WHERE status > 55
, produces: 例如,使用查询字符串
SELECT cust_id, a as A, z SUM(price) as q, from orders WHERE status > 55
,生成:
[
[
"SELECT",
[
[
"-LIST",
[
["-LITERAL", ["cust_id"]],
["AS", [["-LITERAL", ["a"]], ["-LITERAL", ["A"]]]],
[
"AS",
[
["-LITERAL", ["z"]],
["SUM", [["-PAREN", [["-LITERAL", ["price"]]]]]],
["-LITERAL", ["q"]],
],
],
[],
],
],
],
],
["FROM", [["-LITERAL", ["orders"]]]],
[
"WHERE",
[[">", [["-LITERAL", ["status"]], ["-LITERAL", [55]]]]],
],
]
I would like to walk the AST and derive certain information about it. 我想了解一下AST并获取有关它的某些信息。
I would like to know if there is a guide/tutorial/example source code that walks an AST in this type of format. 我想知道是否有一种指南/教程/示例源代码以这种格式传递AST。
Most of the literature I have found considering walking AST's usually assumes I have some sort of class hierarchy describing some sort of variation of the visitor pattern to walk an AST. 我发现考虑步行AST的大多数文献通常都假设我具有某种类层次结构,该类层次结构描述了步行AST的访客模式的某种变化。
My specific use case is converting simple SQL queries to Mongo Queries for the aggregation framework, with some examples given here . 我的特定用例是将简单SQL查询转换为聚合框架的Mongo查询,并在此处给出一些示例。
Here is what I have been doing so far: 到目前为止,这是我一直在做的事情:
I first call a parse
function with the tree dispatches on each subtree given its type and (which is the first parameter in each subtree,) and calls it with the rest of tree. 我首先调用一个
parse
函数,并在给定每个子树的类型和类型(这是每个子树中的第一个参数)的基础上对树进行调度,然后将其与树的其余部分一起调用。 Here is my parse
function: 这是我的
parse
函数:
sub parse {
my ($tree) = @_;
my %results = (ret => []);
for my $subtree (@$tree) {
my ($node_type, $node) = @$subtree;
my $result_dic = $dispatch{$node_type}->($node);
if ($result_dic->{type}) {
my $type = $result_dic->{type};
$results{$type} = [] unless $results{$type};
push $results{$type}, $result_dic->{ret};
%results = merge_except_for($result_dic, \%results, 'ret', $type);
}
else {
push @{$results{ret}}, @{$result_dic->{ret}};
}
}
return \%results;
}
Which uses the following dispatch table: 它使用以下调度表:
my %dispatch = (
SELECT => sub {
my $node = shift;
my $result_dic = parse($node);
$result_dic->{type} = 'select';
if ($result_dic->{as}) {
push $result_dic->{ret}, $result_dic->{as}->[0][0];
}
return $result_dic;
},
'-LITERAL' => sub {
my $node = shift;
my $literal = $node;
return {ret => $node};
},
'-LIST' => sub {
my $node = shift;
my $result_dic = parse($node);
my $ret = flatten_ret($result_dic);
return flatten_ret($result_dic);
},
WHERE => sub {
my $tree = shift;
my @bin_ops = qw/= <= < >= >/;
my $op = $tree->[0];
if ($op ~~ @bin_ops) {
# Not yet implemented
}
return {ret => ''};
},
FROM => sub {
my $tree = shift;
my $parse_result = parse($tree);
return {ret => $parse_result->{ret},
type => 'database'};
},
AS => sub {
my $node = shift;
my $result_dic = parse($node);
$result_dic->{type} = 'as';
return $result_dic;
}
);
sub flatten_ret {
my $result_dic = shift;
return {ret => [
map {
ref($_) ? $_->[0] : $_
} @{$result_dic->{ret}}]};
}
But I'm not sure about certain things, like if I should be checking if the node name is "AS"
in the SELECT
subroutine or finding a way to recurse to fill in the data. 但是我不确定某些事情,例如我是否应该检查
SELECT
子例程中的节点名称是否为"AS"
,还是寻找一种递归填充数据的方法。
Also, what type of data should be returned from each dispatch call and how can I combine it at the end? 另外,每个调度调用应返回哪种类型的数据,最后如何合并?
Also, I am new to AST processing and looking to get a grip on it, so advice on how I could improve my question would also be appreciated. 另外,我是AST处理的新手,并且希望能掌握它,因此,对于如何改善问题的建议也将不胜感激。
Your idea to do typed dispatch is roughly correct. 您进行类型分派的想法大致正确。 Usually one might use objects and dispatch methods on them.
通常,人们可能会在其上使用对象并使用分派方法。 But using a two-element list to tag data with some type works as well.
但是,使用两个元素的列表标记某种类型的数据也可以。 Your misnomed
parse
function implements this dispatch, and somehow aggregates the output. 您错误的
parse
函数将实现此分派,并以某种方式聚合输出。 I am not quite sure what you are trying to achieve with that. 我不太确定您要达到的目标。
When doing AST transforms it is very useful to keep in mind what exact output you want to create. 进行AST转换时,记住要创建的确切输出非常有用。 Let's assume you want to transform
假设您要转换
SELECT cust_id, a as A, SUM(price) as q from orders WHERE status > 55
into the data structure 进入数据结构
{
table => 'orders',
action => 'aggregate',
query => [
'$match' => { 'status' => { '$gt' => 55 } },
'$group' => {
'_id' => undef,
'cust_id' => '$cust_id',
'A' => '$a',
'q' => { '$sum' => '$price' },
},
],
}
What do we have to do for that? 为此我们需要做什么?
SELECT ... FROM ...
type query. SELECT ... FROM ...
类型查询。 aggregate
. aggregate
。 FROM
entry FROM
条目的表名 SELECT
item, get the name, and the expression that produces this value. SELECT
项,获取名称以及产生该值的表达式。
WHERE
clause is present, translate each condition recursively. WHERE
子句,则递归转换每个条件。 If we encounter syntax which we cannot parse, throw an error. 如果遇到无法解析的语法,则会引发错误。
Note that my approach starts from the top, and extracts information from deeper in the AST when we need it. 请注意,我的方法从顶部开始,并在需要时从AST的更深层提取信息。 This is in contrast to your bottom-up approach that munges all data together and hopes something relevant remains at the end.
这与您自下而上的方法相反,后者将所有数据汇总在一起,并希望最后有相关内容。 Especially your hash merging looks dubious.
尤其是您的哈希合并看起来令人怀疑。
How can this be implemented? 如何实现呢? Here is a start:
这是一个开始:
use Carp;
sub translate_select_statement {
my ($select, $from, @other_clauses) = @_;
$select->[0] eq 'SELECT'
or croak "First clause must be a SELECT clause, not $select->[0]";
$from->[0] eq 'FROM'
or croak "Second clause must be a FROM clause, not $from->[0]";
my $select_list = $select->[1];
my %groups = (
_id => undef,
translate_select_list(get_list_items($select_list)),
);
...
}
sub get_list_items {
my ($list) = @_;
if ($list->[0] eq '-LIST') {
return @{ $list->[1] };
}
else {
# so it's probably just a single item
return $list;
}
};
sub translate_select_list {
my %out;
for my $item (@_) {
my ($type, $data) = @$item;
if ($type eq '-LITERAL') {
my ($name) = @$data;
$out{$name} = '$' . $name;
}
elsif ($type eq '-AS') {
my ($expr, $name_literal) = @$data;
$name_literal->[0] eq '-LITERAL'
or croak "in 'x AS y' expression, y must be a literal, but it was $name_literal->[0]";
$out{$name_literal->[1][0]} = translate_expression($expr);
}
else {
croak "I select list, items must be literals or 'x AS y' expression. Found [$type, $data] instead.";
}
}
return %out;
}
sub translate_expression { ... }
The way I structured this, it is much more like a top-down parser, but eg for the translation of arithmetic expression, type dispatch is more important. 我的构造方式更像是自上而下的解析器,但是例如对于算术表达式的翻译,类型分配更为重要。 In the above code,
if
/ else
cases are better, because they allow for more validation. 在上面的代码中,
if
/ else
情况更好,因为它们允许更多的验证。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.