简体   繁体   中英

How to visualize an Abstract Syntax Tree graphically?

I wrote a simple compiler in C++ and want to visualize the Abstract Syntax Tree it produces. Currently, I dump the AST in a super long string similar to something below:

Program(decls=[ConstDecl(type=BasicTypeKind::Int, value=Num(n=1, loc=Location(1, 21)), name=positive, loc=Location(1, 10)), ConstDecl(type=BasicTypeKind::Int, value=Num(n=-1, loc=Location(2, 21)), name=negative, loc=Location(2, 10)), ConstDecl(type=BasicTypeKind::Int, value=Num(n=100, loc=Location(3, 26)), name=max_heap_size, loc=Location(3, 10)), ConstDecl(type=BasicTypeKind::Character, value=Char(c=99, loc=Location(4, 23)), name=...

As you can see, this dump isn't very human friendly in terms of visualization. One cannot naturally relate the notion a tree to such a long string. I tried the approach to pretty-print the AST and found astpretty , which is for Python. It aims a lot in debugging but what if I want an illustration of the AST? A graphical format surely fits better.

Actually I have the picture about what the output I am looking forward. Graphviz does a great job on this field and various graphs the C++ document tool Doxygen generates are very close to my purpose conceptually.

Putting these together, I want a way to turn an AST in memory as C++ objects into decent graphic output (static one is ok). Any good starting point?

Edit: as comments said, dumping my AST in format Graphviz recognizes is a good starting point. I will try do it this way until new and more concrete problems arise. Thanks guys.

I found a solution that is both fast and acceptable. LLVM-8 includes an option to visualize a Control Flow Graph (CFG) by generating a dot format file from it. Basically this works for more general graph structures as long as you specialize the llvm::GraphTraits and llvm::DOTGraphTraits templates (with a little work around) and the llvm::WriteGraph() will work for you. Here is result without too much fine-tuning:

在此处输入图片说明

given the C-like snippet:

const int IntConstant = 1;
int Array[2];

void main() {
  Printf("hello");
}

This language is a simplification of C and it is case-insensitive. A Printf() is represented by a Write statement as the node in the image. ConstDecl means constant declaration . VarDecl means variable declaration and FuncDef means function definition . Other things are pretty much straightforward.

Before explaining the magic, I would like to point you to these documents that really help how to use the llvm APT involved.

  • GraphTraits to know the signature your code should have.
  • WriteGraph to know what WriteGraph requires of your code.
  • CFGPrinter to know how to specialize your GraphTraits
  • DOTGraphTraits to know how to make your illustration meaningful by giving labels and descriptions to nodes.
  • iterator_facade_base to know how to painlessly write working iterators.

Now with this knowledge bases, you just need to put them together. In fact, follow these steps:

  1. think of how to iterate over each node in the AST and specialize the nodes_iterator implementation.
  2. think of how to iterate over all children of a given node and specialize the ChildIteratorType implementation.
  3. think of how to extract useful information from your nodes and specialize the getNodeLabel() and getNodeDescription() of your DOTGraphTraits .
  4. note the mine: the NodeRef type of GraphTraits must be a pointer! knowing this beforehand can save you a day. This is true for the latest version. Maybe in the future they will remove the restriction.

My current implementation only include the class name of a node as its label and a self-explanatory string as its description without any fine-tuning. If you want more effects, ie, color, shape, etc., you can do it in your DOTGraphTraits .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM