[英]How do I find the line and column offsets for imported Python modules?
I have a class ( based on this answer ) that uses ast.NodeVisitor
to get a list of modules imported by a Python file.我有一个 class (基于此答案),它使用ast.NodeVisitor
获取由 Python 文件导入的模块列表。 However, I also want to return the line and column offsets for where the module names are located in the file.但是,我还想返回模块名称在文件中所在位置的行和列偏移量。
Code:代码:
import ast
class ImportFinder(ast.NodeVisitor):
def __init__(self):
self.imports = []
def visit_Import(self, node):
for i in node.names:
self.imports.append({'import_type': "import", 'module': i.name,})
def visit_ImportFrom(self, node):
self.imports.append({'import_type': "from", 'module': node.module})
def parse_imports(source):
tree = ast.parse(source)
finder = ImportFinder()
finder.visit(tree)
return finder.imports
# Example usage
sample_file = '''
from foo import bar, baz, frob
import bar.baz
import bar.foo as baf
'''
parsed_imports = parse_imports(sample_file)
for i in parsed_imports:
print(i)
Current output:当前 output:
{'import_type': 'from', 'module': 'foo'}
{'import_type': 'import', 'module': 'bar.baz'}
{'import_type': 'import', 'module': 'bar.foo'}
Desired output:所需的 output:
{'import_type': 'from', 'module': 'foo', 'line': 2, 'column_offset': 5}
{'import_type': 'import', 'module': 'bar.baz', 'line': 3, 'column_offset': 7}
{'import_type': 'import', 'module': 'bar.foo', 'line': 4, 'column_offset': 9}
How do I get the line and column offsets for imported Python module names?如何获取导入的 Python 模块名称的行和列偏移量?
You might consider this as a starting point.您可以将此视为一个起点。 It doesn't handle continuation lines, but it would be a Machiavellian coder who wrote:它不处理续行,但它将是一个马基雅维利式的编码器,他写道:
import \
os
You could handle that by using a filter function to combine the continuations and yield the longer lines.您可以通过使用过滤器 function 来组合延续并产生更长的行来处理这个问题。
import re
def parse_imports(source):
hits = []
source = re.sub(r"'''[\']'''","",source)
source = re.sub(r'"""[\"]"""',"",source)
for no,line in enumerate(source.splitlines()):
ls = line.lstrip()
if ls.startswith( "from " ):
p1 = ls.split()
mod = p1[1].rstrip()
i1 = line.find(mod)
hits.append({
"import_type": p1[0],
"module": mod,
"line": no+1,
"column_offset": i1
})
elif ls.startswith( "import" ):
cl = ls.split(',')
p1 = cl[0].split()
for mod in [p1[1]] + [c.strip().split()[0] for c in cl[1:]]:
i1 = line.find(mod)
hits.append({
"import_type": p1[0],
"module": mod,
"line": no+1,
"column_offset": i1
})
return hits
# Example usage
sample_file = '''
from foo import bar, baz, frob
import bar.baz
import bar.foo as baf
import os,re, sys
'''
parsed_imports = parse_imports(sample_file)
for i in parsed_imports:
print(i)
Output: Output:
{'import_type': 'from', 'module': 'foo', 'line': 2, 'column_offset': 5}
{'import_type': 'import', 'module': 'bar.baz', 'line': 3, 'column_offset': 7}
{'import_type': 'import', 'module': 'bar.foo', 'line': 4, 'column_offset': 9}
{'import_type': 'import', 'module': 'os', 'line': 5, 'column_offset': 8}
{'import_type': 'import', 'module': 're', 'line': 5, 'column_offset': 11}
{'import_type': 'import', 'module': 'sys', 'line': 5, 'column_offset': 16}
Note -- I've just noticed a bug here.注意——我刚刚注意到这里有一个错误。 I strip out all triple-quoted strings, but I don't compensate for those missing lines in the line count.我删除了所有三引号字符串,但我不补偿行数中缺少的行。 That'll be tricky.那会很棘手。
As of Python 3.10, AST.alias
objects have line and column attributes.从 Python 3.10 开始, AST.alias
对象具有行和列属性。 That solves your problem for import
statements, because the list of imported names in an import
statement are represented as AST.alias
objects.这解决了import
语句的问题,因为import
语句中的导入名称列表表示为AST.alias
对象。
Unfortunately, that doesn't help with from... import
;不幸的是,这对from... import
没有帮助; in an ImportFrom
object, the module is an identifier
, which is a simple string without attributes.在ImportFrom
object 中,模块是一个identifier
,它是一个没有属性的简单字符串。 (The names imported from the module are AST.alias
objects, so each of those does have location information. But you want the location of the module name.) (从模块导入的名称是AST.alias
对象,因此每个对象都有位置信息。但您需要模块名称的位置。)
Still, the statement itself has line and column attributes, even earlier than v3.10, and those tell you where the statement starts and ends.尽管如此,语句本身具有行和列属性,甚至早于 v3.10,它们会告诉您语句的开始和结束位置。 So you could use that information to extract a slice consisting only of the from... import
statement, and then use the tokenizer module to get the second token in the from... import
statement.因此,您可以使用该信息提取仅包含from... import
语句的切片,然后使用tokenizer模块获取from... import
语句中的第二个标记。 (The first token is the from
keyword.) That's a bit clunky but it's got to be easier and more reliable than trying to attack Python source with regular expressions. (第一个标记是from
关键字。)这有点笨拙,但它必须比尝试使用正则表达式攻击 Python 源更容易和更可靠。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.