I would like to match a word that ends with either _foo
or _bar
. I wrote this:
identifier = Word(alphanums + '_')
string = identifier + Suppress('_') + oneOf('foo bar')
Unfortunately, I realized identifier
is greedy and consume all the keyword.
How do I force identifier
to be not greedy?
$ string.parseString('a_keyword_foo')
ParseException: Expected "_" (at char 13), (line:1, col:14)
Some valid keywords:
a_keyword_foo # ['a_keyword', 'foo']
foo_bar_foo # ['foo_bar', 'foo']
bar_bar # ['bar', 'bar']
Some invalid keywords:
keyword_foo_foobar
2keywords_bar # The leading number is perhaps another question...
foo _bar
_foo
Once you know for what you're looking, you can use pp.SkipTo
:
In [38]: foo_or_bar = Literal('foo') | Literal('bar')
In [39]: string = SkipTo(Literal('_') + foo_or_bar) + Literal('_') + foo_or_bar
In [42]: string.parseString('frumpy _foo')
Out[42]: (['frumpy ', '_', 'foo'], {})
Unfortunately, you also get this behavior, though:
In [44]: string.parseString('frumpy _foo _foo')
Out[44]: (['frumpy ', '_', 'foo'], {})
in case the pattern can appear more than once.
The problem is that pyparsing
doesn't do lookahead. If you're concerned about the second case too, you'll have to define it as one or more things ending with underscore + foo or bar (as above), and then take the last one.
If you have to/can switch to the re api you can use non-greedy matching there:
import re
p = re.compile (r"""([a-z_]+?) # lazy matching identifier
_ (bar|foo) # _ with foo or bar
""", re.VERBOSE)
subject_string = 'a_hello_foo'
m = p.match( subject_string )
print "groups:", m.groups()
print "group 1:", m.group(1)
Within pyparsing there is also the possibility to use regex.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.