简体   繁体   English

如何获取格式化字符串中使用的名称列表?

[英]How can I get the list of names used in a formatting string?

Given a formatting string: 给定格式化字符串:

x = "hello %(foo)s  there %(bar)s"

Is there a way to get the names of the formatting variables? 有没有办法获取格式变量的名称? (Without directly parsing them myself). (不自己直接解析它们)。

Using a Regex wouldn't be too tough but I was wondering if there was a more direct way to get these. 使用正则表达式不会太难,但我想知道是否有更直接的方法来获得这些。

Use a dict subclass with overridden __missing__ method and from there you can collect all the missing format variables: 使用带有重写__missing__方法的dict子类,然后从中可以收集所有丢失的格式变量:

class StringFormatVarsCollector(dict):
    def __init__(self, *args, **kwargs):
        self.format_vars = []

    def __missing__(self, k):
        self.format_vars.append(k)
...         
def get_format_vars(s):
    d = StringFormatVarsCollector()     
    s % d                    
    return d.format_vars
... 
>>> get_format_vars("hello %(foo)s  there %(bar)s")
['foo', 'bar']

If you don't want to parse the string, you can use this little function: 如果您不想解析字符串,可以使用这个小函数:

def find_format_vars(string):
    vars= {}
    while True:
        try:
            string%vars
            break
        except KeyError as e:
            vars[e.message]= ''
    return vars.keys()

>>> print find_format_vars("hello %(foo)s there %(bar)s") ['foo', 'bar']

The format fields are only significant to the % operator, not the string itself. 格式字段仅对%运算符有意义,而不是字符串本身。 So, there is no attribute like str.__format_fields__ which you can access in order to get the field names. 因此,没有像str.__format_fields__这样的属性,您可以访问这些属性以获取字段名称。

I'd say that using Regex is actually the correct approach in this case. 我会说在这种情况下使用正则表达式实际上是正确的方法。 You can easily use re.findall to extract the names: 您可以轻松使用re.findall来提取名称:

>>> import re
>>> x = "hello %(foo)s  there %(bar)s"
>>> re.findall('(?<!%)%\(([^)]+)\)[diouxXeEfFgGcrs]', x)
['foo', 'bar']
>>>

Below is an explanation of the pattern: 以下是该模式的解释:

(?<!%)             # Negated look-behind to make sure that we do not match %% 
%                  # Matches %
\(                 # Matches (
(                  # Starts a capture group
[^)]+              # Matches one or more characters that are not )
)                  # Closes the capture group
\)                 # Matches )
[diouxXeEfFgGcrs]  # Matches one of the characters in the square brackets

New style string formatting has this ability. 新样式字符串格式具有此功能。

from string import Formatter

f = Formatter()
x = "hello {foo}s  there {bar}s"
parsed = f.parse(x)

The results of parsed will be an iterable of tuples with this format: 解析的结果将是具有以下格式的元组的可迭代:
(literal_text, field_name, format_spec, conversion) (literal_text,field_name,format_spec,转换)

So it's simple enough to pull out the field_name section of the tuple: 所以它很简单,可以拉出元组的field_name部分:

field_names = [tup[1] for tup in parsed]

Here's the documentation if you would like more in-depth information https://docs.python.org/2/library/string.html#string.Formatter 如果您想要更深入的信息, 参阅以下文档:https://docs.python.org/2/library/string.html#string.Formatter

Single list-comprehension version: 单列表理解版本:

[tup[1] for tup in "hello {foo}s  there {bar}s"._formatter_parser()]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM