简体   繁体   English

将字符串解析为 Python 参数列表

[英]Parsing a string as a Python argument list

Summary总结

I would like to parse a string that represents a Python argument list into a form that I can forward to a function call.我想将表示 Python 参数列表的字符串解析为可以转发给函数调用的形式。

Detailed version详细版

I am building an application in which I would like to be able to parse out argument lists from a text string that would then be converted into the *args,**kwargs pattern to forward to an actual method.我正在构建一个应用程序,我希望能够从文本字符串中解析出参数列表,然后将其转换为*args,**kwargs模式以转发到实际方法。 For example, if my text string is:例如,如果我的文本字符串是:

"hello",42,helper="Larry, the \"wise\""

the parsed result would be something comparable to:解析的结果将类似于:

args=['hello',42]
kwargs={'helper': 'Larry, the "wise"'}

I am aware of Python's ast module, but it only seems to provide a mechanism for parsing entire statements.我知道 Python 的ast模块,但它似乎只提供了一种解析整个语句的机制。 I can sort of fake this by manufacturing a statement around it, eg我可以通过在它周围发表声明来伪造它,例如

ast.parse('f("hello",42,helper="Larry, the \"wise\"")'

and then pull the relevant fields out of the Call node, but this seems like an awful lot of roundabout work.然后将相关字段从Call节点中拉出,但这似乎是一个非常多的迂回工作。

Is there any way to parse just one known node type from a Python AST, or is there an easier approach for getting this functionality?有没有办法从 Python AST 中只解析一种已知的节点类型,或者是否有更简单的方法来获得此功能?

If it helps, I only need to be able to support numeric and string arguments, although strings need to support embedded commas and escaped-out quotes and the like.如果有帮助,我只需要能够支持数字和字符串参数,尽管字符串需要支持嵌入式逗号和转义引号等。

If there is an existing module for building lexers and parsers in Python I am fine with defining my own AST, as well, but obviously I would prefer to just use functionality that already exists and has been tested correct and so on.如果存在用于在 Python 中构建词法分析器和解析器的现有模块,我也可以定义自己的 AST,但显然我更愿意只使用已经存在且经过正确测试等的功能。

Note : Many of the answers focus on how to store the parsed results, but that's not what I care about;注意:很多答案都集中在如何存储解析结果上,但这不是我关心的; it's the parsing itself that I'm trying to solve, ideally without writing an entire parser engine myself.这是我试图解决的解析本身,理想情况下不需要自己编写整个解析器引擎。

Also, my application is already using Jinja which has a parser for Python-ish expressions in its own template parser, although it isn't clear to me how to use it to parse just one subexpression like this.此外,我的应用程序已经在使用Jinja ,它在自己的模板解析器中有一个 Python-ish 表达式的解析器,尽管我不清楚如何使用它来解析这样的一个子表达式。 (This is unfortunately not something going into a template, but into a custom Markdown filter, where I'd like the syntax to match its matching Jinja template function as closely as possible.) (不幸的是,这不是进入模板的东西,而是进入自定义 Markdown 过滤器,我希望语法尽可能匹配其匹配的 Jinja 模板函数。)

I think ast.parse is your best option.我认为ast.parse是你最好的选择。

If the parameters were separated by whitespace, we could use shlex.split :如果参数用空格分隔,我们可以使用shlex.split

>>> shlex.split(r'"hello" 42 helper="Larry, the \"wise\""')
['hello', '42', 'helper=Larry, the "wise"']

But unfortunately, that doesn't split on commas:但不幸的是,这不会以逗号分隔:

>>> shlex.split(r'"hello",42,helper="Larry, the \"wise\""')
['hello,42,helper=Larry, the "wise"']

I also thought about using ast.literal_eval , but that doesn't support keyword arguments:我也考虑过使用ast.literal_eval ,但这不支持关键字参数:

>>> ast.literal_eval(r'"hello",42')
('hello', 42)
>>> ast.literal_eval(r'"hello",42,helper="Larry, the \"wise\""')
Traceback (most recent call last):
  File "<unknown>", line 1
    "hello",42,helper="Larry, the \"wise\""
                     ^
SyntaxError: invalid syntax

I couldn't think of any python literal that supports both positional and keyword arguments.我想不出任何支持位置和关键字参数的 python 文字。


In lack of better ideas, here's a solution using ast.parse :由于缺乏更好的想法,这里有一个使用ast.parse的解决方案:

import ast

def parse_args(args):
    args = 'f({})'.format(args)
    tree = ast.parse(args)
    funccall = tree.body[0].value

    args = [ast.literal_eval(arg) for arg in funccall.args]
    kwargs = {arg.arg: ast.literal_eval(arg.value) for arg in funccall.keywords}
    return args, kwargs

Output:输出:

>>> parse_args(r'"hello",42,helper="Larry, the \"wise\""')
(['hello', 42], {'helper': 'Larry, the "wise"'})

You can use re and a simple class to keep track of the tokens:您可以使用re和一个简单的类来跟踪令牌:

import re
class Akwargs:
   grammar = r'"[\w\s_]+"|"[\w\s,_"]+"|\d+|[a-zA-Z0-9_]+|\='
   def __init__(self, tokens):
      self.tokens = tokens
      self.args = []
      self.kwargs = {}
      self.parse()
   def parse(self):
      current = next(self.tokens, None)
      if current:
         check_next = next(self.tokens, None)
         if not check_next:
            self.args.append(re.sub('^"+|"+$', '', current))
         else:
            if check_next == '=':
               last = next(self.tokens, None)
               if not last:
                   raise ValueError("Expecting kwargs key")
               self.kwargs[current] = re.sub('^"|"$', '', last)
            else:
               self.args.extend(list(map(lambda x:re.sub('^"+|"+$', '', x), [current, check_next])))
         self.parse()

s = '"hello",42,helper="Larry, the \"wise\""'
tokens = iter(re.findall(Akwargs.grammar, s))
params = Akwargs(tokens)
print(params.args)
print(params.kwargs)

Output:输出:

['hello', '42']
{'helper': 'Larry, the "wise"'}

Full tests:完整测试:

strings = ['23,"Bill","James"', 'name="someone",age=23,"testing",300','"hello","42"',  "hello=42", 'foo_bar=5']
new_data = [(lambda x:[getattr(x, i) for i in ['args', 'kwargs']])(Akwargs(iter(re.findall(Akwargs.grammar, d)))) for d in strings]

Output:输出:

[[['23', 'Bill', 'James'], {}], [['testing', '300'], {'age': '23', 'name': 'someone'}], [['hello', '42'], {}], [[], {'hello': '42'}], [[], {'foo_bar': '5'}]]

You can use a function with eval to help you pick apart args and kwargs:您可以使用带有 eval 的函数来帮助您区分 args 和 kwargs:

def f(*args, **kwargs):
  return args, kwargs

import numpy as np
eval("f(1, 'a', x=np.int32)")

gives you给你

((1, 'a'), {'x': <class 'numpy.int32'>})

This is not entirely what you wanted, but it comes close.这不完全是你想要的,但它很接近。

>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--helper')
>>> kwargs,args = parser.parse_known_args(["hello",'42','--helper="Larry, the \"wise\""'])
>>> vars(kwargs)
{'helper': '"Larry, the "wise""'}
>>> args
['hello', '42']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM