简体   繁体   English

调试解析语法

[英]Debugging Pyparsing Grammar

I'm building a parser for an imaginary programming language called C-- (not the actual C-- language). 我正在为一种称为C--(不是实际的C--语言)的虚构编程语言构建解析器。 I've gotten to the stage where I need to translate the language's grammar into something Pyparsing can accept. 我已经到了需要将语言的语法翻译成Pyparsing可以接受的东西的阶段。 Unfortunatly when I come to parse my input string (which is correct and should not cause Pyparsing to error) it's not parsing correctly. 不幸的是,当我解析我的输入字符串(正确且不应导致Pyparsing错误)时,它无法正确解析。 I fear this is due to errors in my grammar, but as I'm starting Pyparsing for the first time, I can't seem to see where I'm going wrong. 我担心这是由于语法错误造成的,但是当我第一次开始Pyparsing时,我似乎看不到哪里出了问题。

I've uploaded the grammar that I'm translating from here for people to have a read through. 我已经上传了我要从此处翻译的语法,以供人们阅读。

EDIT: Updated with the advice from Paul. 编辑:更新了保罗的建议。

This is the grammer I've currently got (the two top lines of Syntax definition are terribly bad of me I know): 这是我目前掌握的语法(我知道语法的两行最糟糕的地方是我):

# Lexical structure definition
ifS = Keyword('if')
elseS = Keyword('else')
whileS = Keyword('while')
returnS = Keyword('return')
intVar = Keyword('int')
voidKeyword = Keyword('void')
sumdiff = Literal('+') | Literal('-')
prodquot = Literal('*') | Literal('/')
relation = Literal('<=') | Literal('<') | Literal('==') | \
           Literal('!=') | Literal('>') | Literal('=>')
lbrace = Literal('{')
rbrace = Literal('}')
lparn = Literal('(')
rparn = Literal(')')
semi = Literal(';')
comma = Literal(',')
number = Word(nums)
identifier = Word(alphas, alphanums)

# Syntax definition
term = ''
statement = ''
variable    =   intVar + identifier + semi
locals      =   ZeroOrMore(variable)
expr        =   term | OneOrMore(Group(sumdiff + term))
args        =   ZeroOrMore(OneOrMore(Group(expr + comma)) | expr)
funccall    =   Group(identifier + lparn + args + rparn)
factor      =   Group(lparn + expr + rparn) | identifier | funccall | number
term        =   factor | OneOrMore(prodquot + factor)
cond        =   Group(lparn + expr + relation + expr + rparn)
returnState =   Group(returnS + semi) | Combine(returnS + expr + semi)
assignment  =   Group(identifier + '=' + expr + semi)
proccall    =   Group(identifier + lparn + args + rparn + semi)
block       =   Group(lbrace + locals + statement + rbrace)
iteration   =   Group(whileS + cond + block)
selection   =   Group(ifS + cond + block) | Group(ifS + cond + block + elseS + block)
statement   =   OneOrMore(proccall | assignment | selection | iteration | returnState)
param       =   Group(intVar + identifier)
paramlist   =   OneOrMore(Combine(param + comma)) | param
params      =   paramlist | voidKeyword
procedure   =   Group(voidKeyword + identifier + lparn + params + rparn + block)
function    =   Group(intVar + identifier + lparn + params + rparn + block)
declaration =   variable | function | procedure
program     =   OneOrMore(declaration)

I'd like to know if there are any mistakes I've made in translating the grammar across and what improvements I could do to make it simplified whilst adhering to the grammar I've been given. 我想知道在翻译语法时是否有任何错误,以及在遵循我所学语法的同时,我可以做些什么改进以简化它。

EDIT 2: Updated to include the new error. 编辑2:更新以包括新的错误。

Here is the input string I am parsing: 这是我正在解析的输入字符串:

int larger ( int first , int second ) { 
if ( first > second ) { 
return first ; 
} else { 
return second ; 
} 
} 

void main ( void ) { 
int count ; 
int sum ; 
int max ; 
int x ; 

x = input ( ) ; 
max = x ; 
sum = 0 ; 
count = 0 ; 

while ( x != 0 ) { 
count = count + 1 ; 
sum = sum + x ; 
max = larger ( max , x ) ; 
x = input ( ) ; 
} 

output ( count ) ; 
output ( sum ) ; 
output ( max ) ; 
} 

And this is the error message I get when running my program from Terminal: 这是从终端运行程序时收到的错误消息:

/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1156: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
/Users/Joe/Documents/Eclipse Projects/Parser/src/pyparsing.py:1258: SyntaxWarning: null string passed to Literal; use Empty() instead
other = Literal( other )
Expected ")" (at char 30), (line:6, col:26)
None

1) Change Literal("if") to Keyword("if") (and so on, down to Literal("void") ), to prevent matching the leading "if" of a variable named "ifactor" . 1)将Literal("if") )更改为Keyword("if") (以此类推,向下更改为Literal("void") ),以防止与名为"ifactor"的变量的前导“ if”匹配。

2) nums , alphas , and alphanums are not expressions, they are strings, that can be used with the Word class to define some typical sets of characters when defining "words" like "a number is a word made up of nums", or "an identifier is a word that starts with an alpha, followed by zero or more alphanums." 2) numsalphasalphanums不是表达式,它们是字符串,可以在定义“单词”时与Word类一起使用,以定义一些典型的字符集,例如“数字是由nums组成的单词”,或者“标识符是一个以字母开头,后跟零个或多个字母数字的单词。” So instead of: 所以代替:

number = nums
identifier = alphas + OneOrMore(alphanums)

you want 你要

number = Word(nums)
identifier = Word(alphas, alphanums)

3) Instead of Combine , I think you want Group . 3)我想而不是Combine ,我想Group Use Combine when you want the matched tokens to be contiguous with no intervening whitespace, and will concatenate the tokens and return them as a single string. 如果希望匹配的标记连续且没有空格,请使用Combine ,并将这些标记连接起来并作为单个字符串返回。 Combine is often used in cases like this: Combine通常用于以下情况:

realnum = Combine(Word(nums) + "." + Word(nums))

Without Combine , parsing "3.14" would return the list of strings ['3', '.', '14'] , so we add Combine so that the parsed result for realnum is '3.14' (which you could then pass to a parse action to convert to the actual floating value 3.14 ). 如果不使用Combine ,则解析"3.14"会返回字符串列表['3', '.', '14'] ,因此我们添加Combine使realnum的解析结果为'3.14' (然后您可以将其传递给解析动作以转换为实际的浮动值3.14 )。 Combine s enforcement of no intervening whitespace also keeps us from accidentally parsing 'The answer is 3. 10 is too much.' 无中间空格的Combine强制执行也使我们避免了意外解析'The answer is 3. 10 is too much.' and thinking the "3. 10" represents a real number. 并认为"3. 10"代表一个实数。

4) This should not cause your error, but your input string has lots of extra spaces. 4)这不会引起您的错误,但是您的输入字符串有很多额外的空格。 If you get your grammar working, you should be able to parse "int x;" 如果您的语法有效,则应该可以解析"int x;" just as well as "int x ;" 就像"int x ;" .

Hope some of these hints get you going. 希望其中一些提示能帮助您前进。 Have you read any online pyparsing articles or tutorials? 您是否阅读过任何在线pyparsing文章或教程? And please look through the online examples. 并请查看在线示例。 You'll need to get a good grasp of how Word , Literal , Combine , etc. perform their individual parsing tasks. 您需要掌握WordLiteralCombine等如何执行其各自的解析任务。

5) You have mis-implemented the recursive definitions for term and statement. 5)您未正确实施术语和陈述的递归定义。 Instead of assigning '' to them, write: 而不是给它们分配'' ,请输入:

term = Forward()
statement = Forward()

Then when you go to actually define them with their recursive definitions, use the << operator (and be sure to enclose the RHS in () 's). 然后,当您使用递归定义实际定义它们时,请使用<<运算符(并确保将RHS括在() )。

term << (... term definition ...)
statement << (... statement definition ...)

You can find an example of a recursive parser here , and a presentation on basic pyparsing usage here - see the section titled "Parsing Lists" for more step-by-step on how the recursion is handled. 你可以找到一个递归解析器的例子在这里 ,并在基本pyparsing用法介绍这里 -参见“解析列表”上的递归如何处理的一步一步的部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM