简体   繁体   中英

Optional string segment in pyparsing

I'm working with pyparsing and trying to define two items as follows:

identifier = Word(alphas, alphanums).setName('identifier)

database_name = Optional(identifier.setResultsName('user') + Suppress('.')) + identifier.setResultsName('database')
table_name = database_name + Suppress('.') + identifier.setResultsName('table')

The idea is that matching against table_name , it will take a string with two or three segments and result in the following:

mark.foo.bar
=> tokens.user = 'mark'
   tokens.database = 'foo'
   tokens.table = 'bar'

Or if the first segment is missing:

foo.bar
=> tokens.user = ''  #anything is acceptable: none, empty string or just plain missing
   tokens.database = 'foo'
   tokens.table = 'bar'

table_name should always have two segments and one dot, or three segments (two dots) as above. One segment is not acceptable.

database_name should have either one segment (database) or two (user.database).

Instances of using database_name work fine - it'll match on one or two segments. However, table_name fails in some cases:

# Works for three segments
mark.foo.bar
=> tokens.user = 'mark'
   tokens.database = 'foo'
   tokens.table = 'bar'

# Fails for two
foo.bar
=> Expected "." (at char 7), (line:1m col:8)

I can see what its doing: foo.bar has been matched to user.database and it's now expecting the third chunk representing the table name. However its not what I want.

Help?

The problem is that, when you match the leading identifier, you don't know enough to say whether it's going to be a user field or not, not until you've looked at all of the possible table fields. Unfortunately, this means that you can't define database name with its leading optional 'user' field by itself, you have to define a comprehensive table_name expression, having two or three fields.

The following code show 3 options for resolving the ambiguity of the leading optional identifier:

  1. try to match the full 3-field form first, and if that fails, try to match the 2-field form

  2. explicitly lookahead when matching the optional leading 'user' field, using FollowedBy to match the 'user' only if it is followed by 2*(DOT+identifier)

  3. match all dot-delimited lists of any length, and use a parse action to verify that only 2 or 3 identifiers have been passed, and assign results names

See the comments to see how each option is implemented. (Note that to simplify the code, I have also replaced use of the full expr.setResultsName('something') to just expr('something') , which I think is overall easier to read.)

from pyparsing import *

identifier = Word(alphas, alphanums).setName('identifier')
DOT = Suppress('.')

# Option 1 - fully specified options
full_database_name = identifier('user') + DOT + identifier('database')
just_database_name = identifier('database')
table_name = (full_database_name + DOT + identifier('table') | 
              just_database_name + DOT + identifier('table'))

# Option 2 - use FollowedBy to explicitly lookahead when checking for leading user
table_name = (Optional(identifier('user') + FollowedBy(2*(DOT+identifier)) + DOT) + 
                identifier('database') + DOT + identifier('table'))

# Option 3 - use liberally matching expression, with a parse action to assign fields
def assignTableFields(fields):
    if len(fields) == 2:
        fields['database'],fields['table'] = fields
    elif len(fields) == 3:
        fields['user'],fields['database'],fields['table'] = fields
    else:
        raise ParseException("wrong number of fields")
table_name = delimitedList(identifier, delim='.').setParseAction(assignTableFields)

for test in ("a.b.c", "b.c"):
    print test
    print table_name.parseString(test).dump()
    print

You may also find this overly liberal of a matcher, as it also permits interleaved whitespace, so that "a . b" will also qualify as a valid table name. You can define another validating parse action, and add it to table_name as:

def noWhitespace(source, locn, tokens):
    if not source[locn:].startswith('.'.join(tokens)):
        raise ParseException("found whitespace between fields")
table_name.addParseAction(noWhitespace)

See that for this parse action, I called addParseAction instead of setParseAction , so that any existing parse actions would be preserved (in the case of Option 3), and this new one added to the chain of parse actions to be run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM