简体   繁体   English

PLY yacc解析器在每个换行符后都丢失了第一学期

[英]PLY yacc parser lost first term after every newline

I try to write a simple parser by PLY, but the parser below will lose the first string after every NEWLINE. 我尝试通过PLY写一个简单的解析器,但是下面的解析器将在每个NEWLINE之后丢失第一个字符串。

The input is "abc\\nb de\\nc f". 输入为“ abc \\ nb de \\ nc f”。

My parser parsed first line statement as state (0, ((('a', 'b'), 'c'), 0)), but next token 'b' is lost. 我的解析器将第一行语句解析为状态(0,(((('a','b'),'c'),0))),但是下一个标记'b'丢失了。 The second line statement is state (0, (('d', 'e'), 0)). 第二行语句是状态(0,((('d','e'),0)))。 How do I fix this? 我该如何解决?

import ply.lex as lex
import ply.yacc as yacc

tokens = ('STRING', 'NEWLINE')
t_STRING  = r'[^ \n]+'
t_ignore = r' '

def t_NEWLINE(t):
    r'\n'
    t.lexer.lineno += 1
    return t

def t_error(t):
    print("Illegal character %s" % t.value[0])
    t.lexer.skip(1)

def p_statement_interactive(p):
    '''statement : plist
                 | plist NEWLINE'''
    p[0] = (0, (p[1],0))
    print "state", p[0]

def p_item_string_expr(p):
    '''plist : plist pitem
             | pitem'''
    if len(p) > 2:
        p[0] = (p[1],p[2])
    else:
        p[0] = p[1]
    print "str2", p[0]

def p_item_string(p):
    '''pitem : STRING'''
    p[0] = p[1]
    print "str1", p[0]

def p_error(p):
    if not p:
        print("SYNTAX ERROR AT EOF")

def main():
    data = """a b c
    b d e
    c f"""

    lexer = lex.lex(debug=0)
    lexer.input(data)

    while True:
        tok = lexer.token()
        if not tok:
            break      # No more input
        print(tok)

    parser = yacc.yacc()
    parser.parse(data)

if __name__ == '__main__':
    main()

Result is: 结果是:

LexToken(STRING,'a',1,0)
LexToken(STRING,'b',1,2)
LexToken(STRING,'c',1,4)
LexToken(NEWLINE,'\n',1,5)
LexToken(STRING,'b',2,10)
LexToken(STRING,'d',2,12)
LexToken(STRING,'e',2,14)
LexToken(NEWLINE,'\n',2,15)
LexToken(STRING,'c',3,20)
LexToken(STRING,'f',3,22)
str1 a
str2 a
str1 b
str2 ('a', 'b')
str1 c
str2 (('a', 'b'), 'c')
state (0, ((('a', 'b'), 'c'), 0))
str1 d
str2 d
str1 e
str2 ('d', 'e')
state (0, (('d', 'e'), 0))
str1 f
str2 f
state (0, ('f', 0))

Your p_error function: 您的p_error函数:

def p_error(p):
    if not p:
        print("SYNTAX ERROR AT EOF")

silently ignores errors except at the end of input. 除输入末尾外,静默忽略错误。 Silently ignoring errors is almost always wrong, and almost always confusing, as it is in this case. 像在这种情况下一样,无视错误几乎总是错误的,并且几乎总是令人困惑。

Your statement production only accepts a single line, possibly terminated with a newline character. 您的statement生产仅接受单行,可能以换行符终止。 No token other than the end-of-file indicator can follow the newline. 除文件结尾指示符外,没有其他标记可以跟随换行符。 So the second token b -- that is, the token at the beginning of the second line -- causes a syntax error. 因此,第二个标记b即第二行开头的标记-会导致语法错误。

Since syntax errors are being silently ignored, there is no indication of this error. 由于语法错误被静默忽略,因此没有迹象表明该错误。 Since PLY will then enter error recovery mode, the parser will effectively restart. 由于PLY随后将进入错误恢复模式,因此解析器将有效地重新启动。 However, the offending token b has already been "handled", so the restart starts at the next token, d . 但是,有问题的令牌b已被“处理​​”,因此重新启动从下一个令牌d

That will happen again after the second newline. 在第二个换行符之后,这种情况将再次发生。 Again, the c at the beginning of the third line will cause a syntax error, which is silently ignored, and then be discarded, and the parser will restart at input f . 同样,第三行开头的c会导致语法错误,该错误将被忽略,然后被丢弃,解析器将在输入f重新启动。

It's not clear to me what your expectation is. 我不清楚您的期望是什么。 One possibility would be to raise SyntaxError (or some other error type) in p_error , rather than just returning, to terminate the parse. 一种可能性是在p_error raise SyntaxError (或其他某种错误类型),而不仅仅是返回终止分析。 However, the erroneous token will already have been discarded. 但是,错误的令牌将已经被丢弃。

Or you might want to accept any number of statements. 或者,您可能希望接受任意数量的语句。 In that case, your statement rule should be something like 在这种情况下,您的声明规则应类似于

statement:
         | statement NEWLINE
         | statement NEWLINE plist

and the action associated with the third option would do whatever you needed done with the plist . 与第三个选项相关的动作可以完成您对plist所做的任何事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM