简体   繁体   中英

Regex ('foo'|'bar') notation

I'm using regex to parse some time data, but my attempt is not matching as I would expect. Here's my code:

import re
print re.findall("\d+:\d+ (am|pm)", "11:30 am - 2:20 pm")

This produces ['am', 'pm'] , not ['11:30 am', '2:20 pm'] , which is what I want.

I can produce the result that I want with \\d+:\\d+ am|\\d+:\\d+ pm , but that is a little blunt and I want to know why the other is not working?

Your problem relates to capturing groups. If you want to have non-capturing alternation use the regex \\d+:\\d+ (?:am|pm) .

You probably don't even need regular expressions to split this particular string. If applicable, you can use the regular str.split() :

>>> s = "11:30 am - 2:20 pm"
>>> s.split(" - ")
['11:30 am', '2:20 pm']

This, of course, does not enforce items to be "time"-like strings.

Quoting docs ( emphasis mine ):

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

You may use re.finditer :

seq = [m.string[m.start():m.end()] for m in re.finditer("\d+:\d+ (am|pm)", "11:30 am - 2:20 pm")]
# ['11:30 am', '2:20 pm']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM