简体   繁体   English

如何使用 Pyparsing 解析字符串,其中涉及忽略异常并移至下一个分隔符?

[英]How to parse a string with Pyparsing which involves ignoring exceptions and moving onto the next delimiter?

I have a program that requires a list of effects, followed by start time, and end time.我有一个程序需要效果列表,然后是开始时间和结束时间。 So I have this string that you acquire from user input (it can be faulty), and I'm trying to parse the relevant information and ignore faulty information, while moving to the next effect, after each ";".所以我有你从用户输入获得的这个字符串(它可能是错误的),我试图解析相关信息并忽略错误信息,同时在每个“;”之后移动到下一个效果。 However I'm not quite sure how to use the Pyparsing library to do this, and I'm wondering whether this can be done purely with the library.但是我不太确定如何使用 Pyparsing 库来做到这一点,我想知道这是否可以纯粹用库来完成。 The comments within the code signify what it should return, and the output below is what it actually returns.代码中的注释表示它应该返回的内容,下面的 output 是它实际返回的内容。

import pyparsing as pp

testcase = "bounce, 5, 10; shutter, 12, 14" # returns [[bounce, 5, 10], [shutter, 12, 14]]
testcase2= "bounce, 5, 10; shutter, 12, 14; low_effort, 2, 23" # returns [[bounce, 5, 10], [shutter, 12, 14], [low_effort, 2, 23]]
testcase3= "_lolw, a, 2; effect, 6;" # returns [[effect, 6, None]]
testcase4= "bounce, 1, 10; effect, 5, a; bounce, 2, 10" # returns [[bounce, 1, 10], [bounce, 2, 10]]
testcase5= ";;;effect, 10; bounce, a, 1; bounce, 3, 10" # returns [[effect, 10, None], [bounce, 3, 10]]
testcase6= "effect, b, a; 9, 10, 11; max9, 10, 11; here, 2, 3; !b, 1, 2;;;" # returns [[here, 2, 3]]

def parseKeyframes(string: str):
    comma = pp.Suppress(",")
    pattern = pp.Word(pp.alphas + "_") + comma + pp.Word(pp.nums) + pp.Optional(comma + pp.Word(pp.nums), default=None)
    # parse pattern seperated by ";"
    pattern = pattern | pp.SkipTo(pp.Literal(";"))
    parsed = pp.delimitedList(pp.Group(pattern), ";")
    parsed = parsed.parseString(string)
    return parsed

print(parseKeyframes(testcase))
print(parseKeyframes(testcase2))
print(parseKeyframes(testcase3))
print(parseKeyframes(testcase4))
print(parseKeyframes(testcase5))
print(parseKeyframes(testcase6))

Output: Output:

[['bounce', '5', '10'], ['shutter', '12', '14']]
[['bounce', '5', '10'], ['shutter', '12', '14'], ['low_effort', '2', '23']]
[['_lolw, a, 2'], ['effect', '6', None]]
[['bounce', '1', '10'], ['effect', '5', None]]
[[''], [''], [''], ['effect', '10', None], ['bounce, a, 1'], ['bounce', '3', '10']]
[['effect, b, a'], ['9, 10, 11'], ['max9, 10, 11'], ['here', '2', '3'], ['!b, 1, 2'], [''], ['']]

You have the right idea, just need to add a few additional terms, and suppress the errors.你有正确的想法,只需要添加一些额外的术语,并抑制错误。

  1. Add '.suppress()' to the SkipTo term so that the skipped invalid text gets suppressed from the output.将“.suppress()”添加到 SkipTo 术语,以便从 output 中抑制跳过的无效文本。
  2. To catch the case where a partial match is followed by invalid text (such as 'effect, 5, a' ), you must add a FollowedBy term so that a match is followed by either a ';'要捕获部分匹配后跟无效文本(例如'effect, 5, a' )的情况,您必须添加一个 FollowedBy 术语,以便匹配后跟一个 ';' or the end of the string.或字符串的结尾。
  3. To handle invalid match at the end of the string, Skip to ";"要处理字符串末尾的无效匹配,请跳至“;” or the end of the string或字符串的结尾

Since ";" | end_of_string由于";" | end_of_string ";" | end_of_string ";" | end_of_string occurs twice for very similar purposes, I created a next_delim term for that. ";" | end_of_string出于非常相似的目的出现了两次,我为此创建了一个next_delim术语。 This makes the lookaheads a little clearer.这使前瞻更加清晰。 I also defined word and integer terms, which made pattern easier for me to follow.我还定义了wordinteger术语,这使我更容易理解pattern

I think this slightly modified version of your parser will give you your expected results:我认为您的解析器的这个稍微修改过的版本会给您预期的结果:

comma = pp.Suppress(",")

# ';'-delimited list of  word , int [, int]
word = pp.Word(pp.alphas + "_")
integer = pp.Word(pp.nums)
pattern = (word
           + comma 
           + integer 
           + pp.Optional(comma + integer, default=None))


# parse pattern seperated by ";"
next_delim = ";" | pp.StringEnd()
skip_invalid = pp.SkipTo(next_delim).suppress()
pattern_or_skip = (pp.Group(pattern + pp.FollowedBy(next_delim))
                   | skip_invalid)

parser = pp.delimitedList(pattern_or_skip, delim=";")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM