I'm using regex to parse some time data, but my attempt is not matching as I would expect. Here's my code:
import re
print re.findall("\d+:\d+ (am|pm)", "11:30 am - 2:20 pm")
This produces ['am', 'pm']
, not ['11:30 am', '2:20 pm']
, which is what I want.
I can produce the result that I want with \\d+:\\d+ am|\\d+:\\d+ pm
, but that is a little blunt and I want to know why the other is not working?
Your problem relates to capturing groups. If you want to have non-capturing alternation use the regex \\d+:\\d+ (?:am|pm)
.
You probably don't even need regular expressions to split this particular string. If applicable, you can use the regular str.split()
:
>>> s = "11:30 am - 2:20 pm"
>>> s.split(" - ")
['11:30 am', '2:20 pm']
This, of course, does not enforce items to be "time"-like strings.
Quoting docs ( emphasis mine ):
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
You may use re.finditer
:
seq = [m.string[m.start():m.end()] for m in re.finditer("\d+:\d+ (am|pm)", "11:30 am - 2:20 pm")]
# ['11:30 am', '2:20 pm']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.