简体   繁体   English

如何找到导入的 Python 模块的行和列偏移?

[英]How do I find the line and column offsets for imported Python modules?

I have a class ( based on this answer ) that uses ast.NodeVisitor to get a list of modules imported by a Python file.我有一个 class (基于此答案),它使用ast.NodeVisitor获取由 Python 文件导入的模块列表。 However, I also want to return the line and column offsets for where the module names are located in the file.但是,我还想返回模块名称在文件中所在位置的行和列偏移量。

Code:代码:

import ast

class ImportFinder(ast.NodeVisitor):
    def __init__(self):
        self.imports = []

    def visit_Import(self, node):
        for i in node.names:
            self.imports.append({'import_type': "import", 'module': i.name,})

    def visit_ImportFrom(self, node):
        self.imports.append({'import_type': "from", 'module': node.module})

def parse_imports(source):
    tree = ast.parse(source)
    finder = ImportFinder()
    finder.visit(tree)
    return finder.imports


# Example usage
sample_file = '''
from foo import bar, baz, frob
import bar.baz
import   bar.foo as baf
'''
parsed_imports = parse_imports(sample_file)
for i in parsed_imports:
    print(i)

Current output:当前 output:

{'import_type': 'from', 'module': 'foo'}
{'import_type': 'import', 'module': 'bar.baz'}
{'import_type': 'import', 'module': 'bar.foo'}

Desired output:所需的 output:

{'import_type': 'from', 'module': 'foo', 'line': 2, 'column_offset': 5}
{'import_type': 'import', 'module': 'bar.baz', 'line': 3, 'column_offset': 7}
{'import_type': 'import', 'module': 'bar.foo', 'line': 4, 'column_offset': 9}

How do I get the line and column offsets for imported Python module names?如何获取导入的 Python 模块名称的行和列偏移量?

You might consider this as a starting point.您可以将此视为一个起点。 It doesn't handle continuation lines, but it would be a Machiavellian coder who wrote:它不处理续行,但它将是一个马基雅维利式的编码器,他写道:

import \
    os

You could handle that by using a filter function to combine the continuations and yield the longer lines.您可以通过使用过滤器 function 来组合延续并产生更长的行来处理这个问题。

import re

def parse_imports(source):
    hits = []
    source = re.sub(r"'''[\']'''","",source)
    source = re.sub(r'"""[\"]"""',"",source)
    for no,line in enumerate(source.splitlines()):
        ls = line.lstrip()
        if ls.startswith( "from " ):
            p1 = ls.split()
            mod = p1[1].rstrip()
            i1 = line.find(mod)
            hits.append({
                "import_type": p1[0],
                "module": mod,
                "line": no+1,
                "column_offset": i1
            })
        elif ls.startswith( "import" ):
            cl = ls.split(',')
            p1 = cl[0].split()
            for mod in  [p1[1]] + [c.strip().split()[0] for c in cl[1:]]:
                i1 = line.find(mod)
                hits.append({
                    "import_type": p1[0],
                    "module": mod,
                    "line": no+1,
                    "column_offset": i1
                })
    return hits

# Example usage
sample_file = '''
from foo import bar, baz, frob
import bar.baz
import   bar.foo as baf
import  os,re,  sys
'''
parsed_imports = parse_imports(sample_file)
for i in parsed_imports:
    print(i)

Output: Output:

{'import_type': 'from', 'module': 'foo', 'line': 2, 'column_offset': 5}
{'import_type': 'import', 'module': 'bar.baz', 'line': 3, 'column_offset': 7}
{'import_type': 'import', 'module': 'bar.foo', 'line': 4, 'column_offset': 9}
{'import_type': 'import', 'module': 'os', 'line': 5, 'column_offset': 8}
{'import_type': 'import', 'module': 're', 'line': 5, 'column_offset': 11}
{'import_type': 'import', 'module': 'sys', 'line': 5, 'column_offset': 16}

Note -- I've just noticed a bug here.注意——我刚刚注意到这里有一个错误。 I strip out all triple-quoted strings, but I don't compensate for those missing lines in the line count.我删除了所有三引号字符串,但我不补偿行数中缺少的行。 That'll be tricky.那会很棘手。

As of Python 3.10, AST.alias objects have line and column attributes.从 Python 3.10 开始, AST.alias对象具有行和列属性。 That solves your problem for import statements, because the list of imported names in an import statement are represented as AST.alias objects.这解决了import语句的问题,因为import语句中的导入名称列表表示为AST.alias对象。

Unfortunately, that doesn't help with from... import ;不幸的是,这对from... import没有帮助; in an ImportFrom object, the module is an identifier , which is a simple string without attributes.ImportFrom object 中,模块是一个identifier ,它是一个没有属性的简单字符串。 (The names imported from the module are AST.alias objects, so each of those does have location information. But you want the location of the module name.) (从模块导入的名称是AST.alias对象,因此每个对象都有位置信息。但您需要模块名称的位置。)

Still, the statement itself has line and column attributes, even earlier than v3.10, and those tell you where the statement starts and ends.尽管如此,语句本身具有行和列属性,甚至早于 v3.10,它们会告诉您语句的开始和结束位置。 So you could use that information to extract a slice consisting only of the from... import statement, and then use the tokenizer module to get the second token in the from... import statement.因此,您可以使用该信息提取仅包含from... import语句的切片,然后使用tokenizer模块获取from... import语句中的第二个标记。 (The first token is the from keyword.) That's a bit clunky but it's got to be easier and more reliable than trying to attack Python source with regular expressions. (第一个标记是from关键字。)这有点笨拙,但它必须比尝试使用正则表达式攻击 Python 源更容易和更可靠。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM