简体   繁体   中英

python regex, optionally match a word

I have the following regex:

PackageQuantity:\b|Servings?PerContainer:\b|Servings?PerPackage:\b(\d+)

that supposed to match the following text:

ServingsPerContainer:about11

Blank white spaces are escaped for comfortability

the idea is, that the words Package Quantity , Servings per container or servings per package can be followed by any word (exactly one word), such as approx. , or about .

Seems simple enough, but I couldn't find a solution, since the regex above matches an empty string instead of the figure

pythonregex.com output:

>>> regex = re.compile("PackageQuantity:\b|Servings?PerContainer:\b|Servings?PerPackage:\b(\d+)",re.IGNORECASE)
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x672858ed0eef4da0>
>>> regex.match(string)
<_sre.SRE_Match object at 0x672858ed0ee8c6a8>

# List the groups found
>>> r.groups()
(None,)

# List the named dictionary objects found
>>> r.groupdict()
{}

# Run findall
>>> regex.findall(string)
[u'']

# Run timeit test
>>> setup = ur"import re; regex =re.compile("PackageQuantity:\b|Servings?PerContainer:\b|S ...
>>> t = timeit.Timer('regex.search(string)',setup)
>>> t.timeit(10000)
0.0259890556335

You are missing the optional word after the :

Either (PackageQuantity:|(Servings)?PerContainer:|(Servings)?PerPackage:)[a-zA-Z.]*(\\d+) or (PackageQuantity:|(Servings)?PerContainer:|(Servings)?PerPackage:)(about|approx.)?(\\d+) if your list of words is not too long should do the trick

You need to include about or approx in your pattern.

>>> import re
>>> s = "ServingsPerContainer:about11"
>>> m = re.search(r'(?:PackageQuantity:\b|Servings?PerContainer:\b|Servings?PerPackage:\b)(?:about|approx)(\d+)', s, re.I)
>>> m
<_sre.SRE_Match object at 0x7f0531c7a648>
>>> m.group()
'ServingsPerContainer:about11'
>>> m.group(1)
'11'

OR

>>> m = re.search(r'(?:PackageQuantity:\b|Servings?PerContainer:\b|Servings?PerPackage:\b)\D*(\d+)', s, re.I)
>>> m.group()
'ServingsPerContainer:about11'

in your regex, you are effectively matching:

  • PackageQuantity:\\b or
  • Servings?PerContainer:\\b or
  • Servings?PerPackage:\\b(\\d+)

You'll note, the \\d+ is only attached to the last of those clauses.

Also, \\d+ matches 1 or more digits. it will not match things like "about" or "approx". \\w+ might be more like what you are looking for. It matches letters, numbers and underscores. Something like:

(PackageQuantity:\b|Servings?PerContainer:\b|Servings?PerPackage:\b)\w+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM