简体   繁体   中英

Python AST module can not detect “if” or “for”

I am trying to restrict a user-provided script, with the following visitor:

class SyntaxChecker(ast.NodeVisitor):

    def check(self, syntax):
        tree = ast.parse(syntax)
        print(ast.dump(tree), syntax)
        self.visit(tree)

    def visit_Call(self, node):
        print('Called for Call', ast.dump(node))
        if isinstance(node.func, ast.Call) and node.func.id not in allowed_functions:
            raise CodeError("%s is not an allowed function!"%node.func.id)
        elif isinstance(node.func, ast.Attribute) and node.func.value.id not in allowed_classes:
            raise CodeError('{0} is not calling an allowed class'.format(node.func.value.id))
        elif isinstance(node.func, ast.Name) and node.func.id in allowed_classes:
            raise CodeError('You are not allowed to instantiate any class, {0}'.format(node.func.id))
        else:
            ast.NodeVisitor.generic_visit(self, node)

    def visit_Assign(self, node):
        print('Called for Assign', ast.dump(node))
        ast.NodeVisitor.generic_visit(self, node)

    def visit_Attribute(self, node):
        print('Called for Attribute', ast.dump(node))
        if node.value.id not in allowed_classes:
            raise CodeError('"{0}" is not an allowed class'.format(node.value.id))
        elif node.value.id in allowed_classes and isinstance(node.ctx, ast.Store):
            raise CodeError('Trying to change something in a pre-defined class, "{0}" in "{1}"'.format(node.attr, node.value.id))
        else:
            ast.NodeVisitor.generic_visit(self, node)

    def visit_Expr(self, node):
        print('Called for Expr', ast.dump(node))
        ast.NodeVisitor.generic_visit(self, node)

    def visit_Name(self, node):
        print('Called for Name', ast.dump(node))
        if isinstance(node.ctx, ast.Store) and node.id in allowed_classes:
            raise CodeError('Trying to change a pre-defined class, {0}'.format(node.id))
        elif isinstance(node.ctx, ast.Load) and node.id not in safe_names and node.id not in allowed_functions and node.id not in allowed_classes:
            raise CodeError('"{0}" function is not allowed'.format(node.id))
        else:
            ast.NodeVisitor.generic_visit(self, node)

    def generic_visit(self, node):
        print('Called for generic', ast.dump(node))        
        if type(node).__name__ not in allowed_node_types:
            raise CodeError("%s is not allowed!"%type(node).__name__)
        else:
            ast.NodeVisitor.generic_visit(self, node)

if __name__ == '__main__':
    # Check whole file
    x = SyntaxChecker()
    code = open(sys.argv[1], 'r').read()
    try:
        x.check(code)
    except CodeError as e:
        print(repr(e))

    # Or check line by line, considering multiline statements
    code = ''
    for line in open(sys.argv[1], 'r'):
        line = line.strip()
        if line:
            code += line
            try:
                print('[{0}]'.format(code))
                x.check(code)
                code = ''
            except CodeError as e:
                print(repr(e))
                break
            except SyntaxError as e:
                print('********Feeding next line', repr(e))

It is doing fine for the time being, and I will tune it more but the problem is that this always throws SyntaxError('unexpected EOF while parsing', ('<unknown>', 1, 15, 'for j in Ab():')) while parsing something like this

for j in A.b():
    print('hey')

and because of this, no for or if gets parsed.

EDIT: I have added code to check for whole code at once, or check multi-line statements.

You are parsing the code line by line, but a for loop does not stand alone . A for loop without a suite is a syntax error. Python expected to find a suite and found the EOF (end of file) instead.

In other words, your parser can only handle Simple Statements and standalone Expressions on one physical line, and Compound Statements if they are directly followed by a Simple Statement or Expression on the same line.

Your code will also fail for:

  • Multiline strings

     somestring = """Containing more than one line""" 
  • Line continuations

     if the_line == 'too long' and \\ a_backslash_was_used in (True, 'true'): # your code fails somevar = (you_are_allowed_to_use_newlines, "inside parentheses and brackets and braces") 

Using ast.parse() to check code line by line is not going to work here; it is only suitable for whole suites; on a file by file basis I'd only pass in the whole file .

To check code line by line you need to tokenize it yourself. You can use the tokenize library ; it'll report either a SyntaxError exception or a tokenize.TokenError on syntax errors.

if you wanted to restrict a script, take a look at asteval ; either the project itself or its source code. They parse the whole script , then execute based on the resulting AST nodes (limiting what nodes they'll accept).

你可以使用ast.parse解析is实例(iterator,(Ast,if,Ast.For))。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM