I'm working with pyparsing and trying to define two items as follows:
identifier = Word(alphas, alphanums).setName('identifier)
database_name = Optional(identifier.setResultsName('user') + Suppress('.')) + identifier.setResultsName('database')
table_name = database_name + Suppress('.') + identifier.setResultsName('table')
The idea is that matching against table_name
, it will take a string with two or three segments and result in the following:
mark.foo.bar
=> tokens.user = 'mark'
tokens.database = 'foo'
tokens.table = 'bar'
Or if the first segment is missing:
foo.bar
=> tokens.user = '' #anything is acceptable: none, empty string or just plain missing
tokens.database = 'foo'
tokens.table = 'bar'
table_name
should always have two segments and one dot, or three segments (two dots) as above. One segment is not acceptable.
database_name
should have either one segment (database) or two (user.database).
Instances of using database_name
work fine - it'll match on one or two segments. However, table_name
fails in some cases:
# Works for three segments
mark.foo.bar
=> tokens.user = 'mark'
tokens.database = 'foo'
tokens.table = 'bar'
# Fails for two
foo.bar
=> Expected "." (at char 7), (line:1m col:8)
I can see what its doing: foo.bar
has been matched to user.database and it's now expecting the third chunk representing the table name. However its not what I want.
Help?
The problem is that, when you match the leading identifier, you don't know enough to say whether it's going to be a user field or not, not until you've looked at all of the possible table fields. Unfortunately, this means that you can't define database name with its leading optional 'user' field by itself, you have to define a comprehensive table_name
expression, having two or three fields.
The following code show 3 options for resolving the ambiguity of the leading optional identifier:
try to match the full 3-field form first, and if that fails, try to match the 2-field form
explicitly lookahead when matching the optional leading 'user' field, using FollowedBy to match the 'user' only if it is followed by 2*(DOT+identifier)
match all dot-delimited lists of any length, and use a parse action to verify that only 2 or 3 identifiers have been passed, and assign results names
See the comments to see how each option is implemented. (Note that to simplify the code, I have also replaced use of the full expr.setResultsName('something')
to just expr('something')
, which I think is overall easier to read.)
from pyparsing import *
identifier = Word(alphas, alphanums).setName('identifier')
DOT = Suppress('.')
# Option 1 - fully specified options
full_database_name = identifier('user') + DOT + identifier('database')
just_database_name = identifier('database')
table_name = (full_database_name + DOT + identifier('table') |
just_database_name + DOT + identifier('table'))
# Option 2 - use FollowedBy to explicitly lookahead when checking for leading user
table_name = (Optional(identifier('user') + FollowedBy(2*(DOT+identifier)) + DOT) +
identifier('database') + DOT + identifier('table'))
# Option 3 - use liberally matching expression, with a parse action to assign fields
def assignTableFields(fields):
if len(fields) == 2:
fields['database'],fields['table'] = fields
elif len(fields) == 3:
fields['user'],fields['database'],fields['table'] = fields
else:
raise ParseException("wrong number of fields")
table_name = delimitedList(identifier, delim='.').setParseAction(assignTableFields)
for test in ("a.b.c", "b.c"):
print test
print table_name.parseString(test).dump()
print
You may also find this overly liberal of a matcher, as it also permits interleaved whitespace, so that "a . b"
will also qualify as a valid table name. You can define another validating parse action, and add it to table_name as:
def noWhitespace(source, locn, tokens):
if not source[locn:].startswith('.'.join(tokens)):
raise ParseException("found whitespace between fields")
table_name.addParseAction(noWhitespace)
See that for this parse action, I called addParseAction
instead of setParseAction
, so that any existing parse actions would be preserved (in the case of Option 3), and this new one added to the chain of parse actions to be run.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.