使用 pyparsing 使用 `delimitedList` 设置最大出现次数

Question

pyparsing provides a helper function, delimitedList , that matches a sequence of one or more expressions , separated with a delimiter : pyparsing提供了一个辅助函数delimitedList ，它匹配一个或多个表达式的序列，用分隔符分隔：

delimitedList(expr, delim=',', combine=False)

How can this be used to match a sequence of expressions, where each expression may occur zero or one times?这如何用于匹配一系列表达式，其中每个表达式可能出现零次或一次？

For example, to match "foo", "bar, "baz" I took a bottom-up approach a created a token for each word:例如，为了匹配"foo", "bar, "baz"我采用了自下而上的方法，为每个单词创建了一个标记：

import pyparsing as pp

dbl_quote = pp.Suppress('"')

foo = dbl_quote + pp.Literal('foo') + dbl_quote
bar = dbl_quote + pp.Literal('bar') + dbl_quote
baz = dbl_quote + pp.Literal('baz') + dbl_quote

I want to create an expression that matches:我想创建一个匹配的表达式：

zero or one occurrences of "foo" , zero or one occurrences of "bar" , zero or one occurrences of "baz"零或一次出现"foo" ，零或一个事件"bar" ，零个或一个出现"baz"

... in any order . ...以任何顺序。 Examples of valid input:有效输入示例：

"foo", "bar", "baz"
"baz", "bar", "foo", // Order is unimportant "baz", "bar", "foo", // 顺序不重要
"bar", "baz" // Zero occurrences allowed "bar", "baz" // 允许出现零次
"baz"
// Zero occurrences of all tokens // 所有标记的出现次数为零

Examples of invalid input:无效输入示例：

"notfoo", "notbar", "notbaz"
"foo", "foo", "bar", "baz" // Two occurrences of foo "foo", "foo", "bar", "baz" // foo出现两次
"foo" "bar", "baz" // Missing comma "foo" "bar", "baz" // 缺少逗号
"foo" "bar", "baz", // Trailing comma "foo" "bar", "baz", // 尾随逗号

I gravitated towards delimitedList because my input is a comma delimited list, but now I feel this function is working against me rather than for me.我倾向于delimitedList因为我的输入是一个逗号分隔的列表，但现在我觉得这个功能对我不利，而不是对我有用。

import pyparsing as pp

dbl_quote = pp.Suppress('"')

foo = dbl_quote + pp.Literal('foo') + dbl_quote
bar = dbl_quote + pp.Literal('bar') + dbl_quote
baz = dbl_quote + pp.Literal('baz') + dbl_quote



# This is NOT what I want because it allows tokens
# to occur more than once.
foobarbaz = pp.delimitedList(foo | bar | baz)



if __name__ == "__main__":
    TEST = '"foo", "bar", "baz"'
    results = foobarbaz.parseString(TEST)
    results.pprint()

Answer 1

Ordinarily, when I see "in any order" as part of a grammar, my first thought is to use Each , which you can create with the & operator:通常，当我看到“以任何顺序”作为语法的一部分时，我的第一个想法是使用Each ，您可以使用&运算符创建它：

undelimited_foo_bar_baz = foo & bar & baz

This parser would parse foo , bar , and baz in any order.这个解析器会以任何顺序解析foo 、 bar和baz 。 If you wanted them to be optional, then simply wrap them in Optional:如果您希望它们是可选的，那么只需将它们包装在 Optional 中：

undelimited_foo_bar_baz = Optional(foo) & Optional(bar) & Optional(baz)

But the intervening commas in your input make this kind of messy, so as a fallback, you can stick with the delimitedList (which will strip out the commas) add a condition parse action to get run after the list is parsed, to verify that only one of each of the matched items was present:但是输入中的中间逗号会使这种混乱，因此作为后备，您可以坚持使用delimitedList （它将去除逗号）添加条件解析操作以在解析列表后运行，以验证仅存在每个匹配项目中的一个：

from collections import Counter
def no_more_than_one_of_any(t):
    return all(freq == 1 for freq in Counter(t.asList()).values())
foobarbaz.addCondition(no_more_than_one_of_any, message="duplicate item found in list")

if __name__ == "__main__":
    tests = '''\
    "foo"
    "bar"
    "baz"
    "foo", "baz"
    "foo", "bar", "baz"
    "foo", "bar", "baz", "foo"
    '''
    foobarbaz.runTests(tests)

Prints:印刷：

"foo"
['foo']

"bar"
['bar']

"baz"
['baz']

"foo", "baz"
['foo', 'baz']

"foo", "bar", "baz"
['foo', 'bar', 'baz']

"foo", "bar", "baz", "foo"
^
FAIL: duplicate item found in list, found '"'  (at char 0), (line:1, col:1)

使用 pyparsing 使用 `delimitedList` 设置最大出现次数

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-03-21 21:39:57

使用 pyparsing 使用 `delimitedList` 设置最大出现次数

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-03-21 21:39:57

解决方案1
0 已采纳 2020-03-21 21:39:57