Python PLY：在每一行输入中获取语法错误

Question

Working on writing a compiler for the for loop construct of C. However, Im still stuck at the preliminary task of parsing the starting part of a C program namely the header files to be included and the main function. 我正在为C的for循环构造编写编译器。但是，Im仍然停留在解析C程序的开始部分（即要包括的头文件和主要功能）的首要任务。

Here is my code: 这是我的代码：

import ply.lex as lex
import ply.yacc as yacc
tokens = ('HASH','INCLUDE','HEADER_FILE','MAIN','FLOW_OPEN','FLOW_CLOSE','SEMI_COLON','TYPE','SMALL_OPEN','SMALL_CLOSE','OTHERS')

t_HASH = r'\#'
t_INCLUDE = r'include'
t_HEADER_FILE = r'<stdio.h>'
t_MAIN = r'main' 
t_FLOW_OPEN = r'{'
t_FLOW_CLOSE = r'}'
t_SMALL_OPEN = r'\('
t_SMALL_CLOSE = r'\)'
t_SEMI_COLON = r';'
t_OTHERS = r'[a-zA-Z][a-zA-Z]*'
t_TYPE = r'int|void'

def t_error(token):
    print(f'Illegal character: {token.value}')

def t_whitespace(t):
    r'\s+'
    pass

def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)

lexer = lex.lex()
#Building the parser

def p_expression_start(p):
    'expression : header body'

def p_header(p):
    'header : HASH INCLUDE HEADER_FILE'

def p_body(p):
    'body : main rest'

def p_main(p):
    'main : TYPE MAIN SMALL_OPEN SMALL_CLOSE'

def p_rest(p):
    'rest : FLOW_OPEN st FLOW_CLOSE'

def p_st(p):
    ''''
        st : OTHERS st
            | end
        '''
def p_end(p): #Empty production
    'end : SEMI_COLON' 

def p_error(p):
    print("Syntax error in input!")

parser = yacc.yacc(method='LALR',debug=True)

with open(r'forparsing.txt','r') as file:
    while True:
        try:
            line = next(file)
            print('Parsing')
            parser.parse(line)
        except:
            print('Finished')
            break

And the input I'm giving is: 我给的输入是：

# include <stdio.h>
void main()
{
 abc;
 }

But on running the program, I get a Syntax error on each line. 但是在运行程序时，每行都会出现语法错误。 What could be wrong here. 这里可能出什么问题了。 From my understanding, the parser is not able to derive back the start symbol from the given input but I dont know how to go about fixing this. 根据我的理解，解析器无法从给定的输入中派生出起始符号，但我不知道如何解决此问题。 In general, how do I debug syntax error issues with PLY? 通常，如何调试PLY的语法错误问题？

Answer 1

None of your lines of input are syntactically valid on their own. 您的输入行本身都不在语法上有效。 They only form a syntactically valid program when parsed as a whole. 它们仅在整体上解析时才能形成语法上有效的程序。 So you'll need to call parse once with a string containing the whole program, not once per line. 因此，您需要使用包含整个程序的字符串调用一次parse ，而不是每行一次。

You can do this by just calling file.read() in your file handling code instead of using a while loop. 您可以通过在文件处理代码中调用file.read()而不是使用while循环来实现。

The syntax error you're running into after fixing this is due to the way the that overlapping lexical rules are handled in PLY. 解决此问题后遇到的语法错误是由于在PLY中处理重叠词法规则的方式引起的。 In sane lexer generators, the rule that produces the longest match wins and, if both produce the same match, the one that comes first in the code wins. 在理智的词法生成器中，产生最长匹配项的规则将获胜；如果两者产生相同的匹配项，则该规则中的第一位将获胜。 However, in PLY the one with the longest regex wins. 但是，在PLY中，正则表达式最长的那个赢了。 Due to this behavior, you can't use separate rules to match identifiers and keywords using PLY. 由于这种行为，您不能使用单独的规则来使用PLY匹配标识符和关键字。 In this case, the t_OTHERS rule is used even if, say, t_INCLUDE also matches. 在这种情况下，即使t_INCLUDE也匹配，也会使用t_OTHERS规则。

Instead the PLY documentation recommends the following way of matching identifiers and keywords: 相反，PLY文档建议采用以下方式来匹配标识符和关键字：

To handle reserved words, you should write a single rule to match an identifier and do a special name lookup in a function like this: 要处理保留字，您应该编写一条规则以匹配标识符，并在类似于以下功能的函数中进行特殊名称查找：
  reserved = { 'if' : 'IF', 'then' : 'THEN', 'else' : 'ELSE', 'while' : 'WHILE', ... } tokens = ['LPAREN','RPAREN',...,'ID'] + list(reserved.values()) def t_ID(t): r'[a-zA-Z_][a-zA-Z_0-9]*' t.type = reserved.get(t.value,'ID') # Check for reserved words return t 
This approach greatly reduces the number of regular expression rules and is likely to make things a little faster. 这种方法大大减少了正则表达式规则的数量，并且可能使事情变得更快。

Note: You should avoid writing individual rules for reserved words. 注意：您应该避免为保留字编写单独的规则。 For example, if you write rules like this, 例如，如果您编写这样的规则，
  t_FOR = r'for' t_PRINT = r'print' 
those rules will be triggered for identifiers that include those words as a prefix such as "forget" or "printed". 对于包含这些词作为前缀的标识符（例如“忘记”或“已打印”），将触发这些规则。 This is probably not what you want. 这可能不是您想要的。

Again, it should be pointed out, that neither of the issues mentioned there exist in lexer generators that use the maximum munch rule. 再次指出，使用最大删节规则的词法生成器中没有提到的任何问题。

In general, how do I debug syntax error issues with PLY? 通常，如何调试PLY的语法错误问题？

The first step would be to change p_error to print out some useful information (such as which type of token on which line caused the syntax error) like this: 第一步是更改p_error以打印出一些有用的信息（例如，在哪一行上的哪种令牌类型导致语法错误），如下所示：

def p_error(p):
    if p == None:
        token = "end of file"
    else:
        token = f"{p.type}({p.value}) on line {p.lineno}"

    print(f"Syntax error: Unexpected {token}")

Python PLY：在每一行输入中获取语法错误

问题描述

1 个解决方案

解决方案1
1 2019-02-28 20:11:43

Python PLY：在每一行输入中获取语法错误

问题描述

1 个解决方案

解决方案1 1 2019-02-28 20:11:43

解决方案1
1 2019-02-28 20:11:43