简体   繁体   中英

How to print an AST from a parser generator to graphviz?

I try to print an AST representation from the calculator example of PackCC .

I added a "Node" object and functions to write a dot file.

That seems to works correctly with '+' and '-' expressions, but the result is unexpected when I mix '+' and '*'.

在此处输入图像描述

Notice these two operations has different precedence levels if it can help you.

This is the grammar file.

%prefix "calc"
%value "Node*" # The type of "$$" values.

###############################################################################

%header{
    typedef struct Node {
        const char* data;
        struct Node* l_child;
        struct Node* r_child;
    }Node;
    
    Node* node(const char* text, Node* left, Node* right);
    
    void dot_graph(char* name, Node* root);
}

###############################################################################

%source {
    #include <stdio.h>
    #include <stdlib.h>
    
    Node* node(const char* text, Node* left, Node* right)
    {
        Node* res = malloc(sizeof (Node));
        res->data = text;
        res->l_child = left;
        res->r_child = right;
        return res;
    }
    
    void fill_dot(FILE* f, Node* n, Node* parent, int parentNum)
    {
        static int currentNum; // This number will be printed to make unique names.
        // Lines that represents nodes.
        if (parent == NULL) {
            fprintf(f, "    node%d [label=\"%s\"]\n", currentNum, n->data);
        }
        else {
            fprintf(f, "    node%d [label=\"%s\"]\n", currentNum, n->data);
            fprintf(f, "    node%d -- node%d\n", parentNum, currentNum);
            parentNum += 1;
        }
        currentNum += 1;
    
        if (n->l_child != NULL) {
            fill_dot(f, n->l_child, n, parentNum);
        }
    
        if (n->r_child != NULL) {
            fill_dot(f, n->r_child, n, parentNum);
        }
        parentNum -= 1;
        if( parentNum == -1) // Hopefully the end of the recursion.
            currentNum = 0;
    
    }
    
    void dot_graph(char* name, Node* root)
    {
        FILE* res = fopen(name, "w");
        if (res == NULL) {
            printf("File problem: %s\n", name);
            exit(1);
        }
        fprintf(res, "graph {\n"); // Opens the dot file
        fill_dot(res, root, NULL, 0); // fills with the nodes.
        fprintf(res, "}\n"); // Close the dot file.
        fclose(res);
    }

}

###############################################################################

statement <- _ e:expression _ EOL { $$ = e; puts("Expression parsed");}
           / ( !EOL . )* EOL      { printf("error\n"); }

expression <- e:term { $$ = e; }

term <- l:term _ < '+' > _ r:factor { $$ = node(strdup($1), l, r); }
      / l:term _ < '-' > _ r:factor { $$ = node(strdup($2), l, r); }
      / e:factor                { $$ = e; }

factor <- l:factor _ < '*' > _ r:unary { $$ = node(strdup($1), l, r); }
        / l:factor _ < '/' > _ r:unary { $$ = node(strdup($2), l, r); }
        / e:unary                  { $$ = e; }

unary <- < '+' > _ e:unary { $$ = node(strdup($1), e, NULL ); }
       / < '-' > _ e:unary { $$ = node(strdup($2), e, NULL ); }
       / e:primary     { $$ = e; }

primary <- < [0-9]+ >               { $$ = node(strdup($1), NULL, NULL); }
         / '(' _ e:expression _ ')' { $$ = e; }

_      <- [ \t]*
EOL    <- '\n' / '\r\n' / '\r' / ';'

%%
int main() {
    Node* root = node("", NULL, NULL);
    calc_context_t *ctx = calc_create(NULL);
    while (calc_parse(ctx, &root)) { // The root node will be "extended" by childs during parsing.
    }
    dot_graph("calc.dot", root);
    calc_destroy(ctx);
    // No "free" for the nodes to shorten this file.
    return 0;
}

The commands to run packcc, compile and run the parser.

    ./packcc calc.peg && gcc calc.c && ./a.out

The command to watch the graph if you are on linux.

     dot -Tx11 calc.dot

I wonder if the problem is in the parser actions or in my painfully designed dot printer.

It's the way you walk the trees. You're not correctly identifying the parent node during your scan. As far as I can see, it has nothing to do with the parse (and hence packcc), and the results and the code I present below were created without using packcc or the grammar at all; I just created a few trees by hand using your node function. That's usually a better way to debug, as well as being a better way to create minimal examples, because it helps clarify what is irrelevant to the problem (in this case, the parser, which is quite a lot of irrelevant code).

Here's what your function produces, with the correct lines on the right (produced with diff --side-by-side so that you can see the differences):


     -- YOUR CODE --              -- CORRECT --
graph {                       graph {
    node0 [label="+"]             node0 [label="+"]
    node1 [label="4"]             node1 [label="4"]
    node0 -- node1                node0 -- node1
    node2 [label="*"]             node2 [label="*"]
    node0 -- node2                node0 -- node2
    node3 [label="5"]             node3 [label="5"]
    node1 -- node3          |     node2 -- node3
    node4 [label="6"]             node4 [label="6"]
    node1 -- node4          |     node2 -- node4
}                             }

Everything is fine except that node1 is used as the parent of node3 and node4 instead of node2 . Of course, this is a very simple graph. There would be more errors with a larger graph. (I suspect the reason it worked with addition was simply that left associativity produced a graph leaning to the left. Indeed, if you draw a graph for 4*5+6 instead of 4+5*6 , you won't see any problem. (By the way, it's great that you show everything including build instructions. You just missed the input strings:-( But I figured them out.)

"Painfully designed" isn't a bad description. It's a bit of a Rube Goldberg. As a rule of thumb, anytime you find yourself using a static variable in a recursion, you should immediately stop and rethink your approach. It's never the correct solution. If you need to pass data around, use arguments and return values. It's almost always simpler and clearer, and it is always less fragile and more thread-safe.

With that in mind, here's the simple recursive printer. I eliminated the parent argument since you're just using it as a flag:

/* If parent is -1, this is the root node. Otherwise, it's the nodeid of
 * the parent node. nextid is the next node id to use for a child, if any.
 * Returns the next id to use.
 */
static int gv_helper(FILE* f, Node* node, int parent, int nextid) {
  if (node == NULL) return nextid;
  int id = nextid++;   /* Get an id for this node */

  /* Show this node */
  fprintf(f, "    node%d [label=\"%s\"]\n", id, node->data);
  if (parent >= 0)
    fprintf(f, "    node%d -- node%d\n", parent, id);

  /* Recurse over the children. Advance nextid as we go. */
  nextid = gv_helper(f, node->l_child, id, nextid);
  return gv_helper(f, node->r_child, id, nextid);
}

/* The public entry avoids having to pass magical arguments. But sometimes it
 * would be better to give the client access to all the arguments.
 */
int gv_print(FILE* f, Node* root) {
  return gv_helper(f, root, -1, 0);
}

Rather than independently incrementing the node ID and the parent ID (which cannot work, anyway), the code simply observes that the node id assigned in the call is the parent id for the recursive calls on the children. In this way, the recursion helps make the solution clearer, rather than dependent on hard-to-analyse state.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM