简体   繁体   English

在Python2.7 ANTLR4中,从解析器规则中提取令牌并将其存储在列表中

[英]In Python2.7 ANTLR4, extract tokens from a parser rule and store them in a list

In my grammar I validate boolean expressions that look something like this: 在我的语法中,我验证看起来像这样的布尔表达式:

((foo == true) && (bar != false) || (qux == norf))

I obtain the string from ANTLR4's context object by calling getText() : 我通过调用getText()从ANTLR4的上下文对象获取字符串:

def enterBody(self, ctx):
    expression = ctx.condition.getText() # condition here being shorthand for a grammar rule (`condition=expr`)

However, the string is returned whole (ie no spaces between each individual token) and I have no way of knowing what each token is: 但是,该字符串全部返回(即,每个标记之间没有空格),并且我无法知道每个标记是什么:

((foo==true)&&(bar!=false)||(qux==norf))

Ideally, I would like it stored in a list in the following format: 理想情况下,我希望它以以下格式存储在列表中:

['(', '(', 'foo', '==', 'true', ')', '&&', '(', 'bar', '!=', 'false', ')', '||', '(', 'qux', '==', 'norf', ')', ')']

The ANTLR4 Python documentation is rather sparse and I'm not sure if there's a method that accomplishes this. ANTLR4 Python文档非常稀疏,我不确定是否有实现此目的的方法。

Python runtime is really similar to the Java runtime, so you can look at the Java documentation and most likely the same method exists in Python. Python运行时确实类似于Java运行时,因此您可以查看Java文档,并且很可能在Python中存在相同的方法。 Or browse source code , it is pretty easy to read. 或浏览源代码 ,这很容易阅读。

You're asking for getting a flat list of string. 您要获取平面字符串列表。 But the whole idea of parser is to avoid this. 但是解析器的整个想法是避免这种情况。 So I think it is most likely not the thing you need. 因此,我认为这很可能不是您需要的东西。 Make sure to be aware about parse tree and how listeners work . 确保了解解析树以及侦听器的工作方式 Basically you should work with tree and not with flat list. 基本上,您应该使用树而不是平面列表。 What you probably are looking for is ParserRuleContext.getChildren() . 您可能正在寻找的是ParserRuleContext.getChildren() You can use it to access all child nodes: 您可以使用它来访问所有子节点:

def enterBody(self, ctx):
    print(list(ctx.getChildren()))

Which is even more likely, you want to access specific type of a child node for some action. 您甚至有可能想要访问特定类型的子节点以执行某些操作。 Take a look at the parser generated by ANTLR for you. 看看ANTLR为您生成的解析器。 You will find bunch of *Context classes, which contain methods to access every type of subnode. 您会发现一堆*Context类,其中包含访问每种子节点类型的方法。 For example ctx parameter of the enterBody() method is instance of the BodyContext and you can use all it's methods to access its child nodes of specific type. 例如ctx的参数enterBody()方法是实例BodyContext ,你可以使用所有它的方法来访问特定类型的子节点。

UPD If your grammar only defines a boolean expression and you use it only for validation and tokenization, you won't need parser at all. UPD如果语法仅定义布尔表达式,并且仅将其用于验证和标记化,则根本不需要解析器。 Just use lexer to get list of all tokens: 只需使用lexer即可获取所有令牌的列表:

input_stream = antlr4.FileStream('input.txt')

# Instantiate an run generated lexer
lexer = BooleanLexer(input_stream)
tokens = antlr4.CommonTokenStream(lexer)

# Parse all tokens until EOF
tokens.fill()

# Print tokens as text (EOF is stripped from the end)
print([token.text for token in tokens.tokens][:-1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM