简体   繁体   中英

Pyparsing: Quick Reference parser definition correct?

Working my way through:

Pyparsing Quick Reference, Chapter 3: Small Example -

The example parser is supposed to match valid Python identifiers, so

'a_#'

should be invalid, like the author comments it to be, right? However, at the bottom of the page:

---Test for 'a_#'
  Matches: ['a', '_']

Here's the parser:

first = pp.Word(pp.alphas+"_", exact=1)
rest = pp.Word(pp.alphanums+"_")
identifier = first+pp.Optional(rest)

I'm not sure, so I'd like some feedback before contacting the author.

Also, I'm trying to correct it by constructing a parser that would only accept the defined character range within the whole string, so it wouldn't match a prefix of it. Can't get it right - any advice?

Yikes! Building up an identifier using two Word s is wasteful, inefficient, and just bad pyparsing practice. I think the author was doing this as a buildup to showing how Combine could be used here, but afterword, he should show the better alternative using just a single Word expression.

Word has a two-argument format (clearly described in the online docs ) for just this situation:

valid_ident_leading_chars = alphas + '_'
valid_ident_body_chars = alphanums + '_'
identifier = Word(valid_ident_leading_chars, valid_ident_body_chars)

(BTW, this is equivalent to:

identifier = Regex('['+valid_ident_leading_chars+']['+valid_ident_body_chars+']*')

And if you look in the pyparsing code, you'll see that Word implements its matching by building that very regular expression.)

This will still parse the leading part of 'a_#', the same as a regular expression would. If you want your test to fail because the full string was not parsed, use:

identifier.parseString('a_#', parseAll=True)

For simplicity in writing tests, you can also use '==' - when comparing a pyparsing expression with a string, the expression will run expr.parseString(comparison_string, parseAll=True) , and return True/False depending on whether a ParseException was raised or not.

assert 'a_' == identifier    # <-- will pass
assert 'a_#' == identifier   # <-- will fail

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM