Regex ('foo'|'bar') notation

Question

I'm using regex to parse some time data, but my attempt is not matching as I would expect. Here's my code:

import re
print re.findall("\d+:\d+ (am|pm)", "11:30 am - 2:20 pm")

This produces ['am', 'pm'] , not ['11:30 am', '2:20 pm'] , which is what I want.

I can produce the result that I want with \\d+:\\d+ am|\\d+:\\d+ pm , but that is a little blunt and I want to know why the other is not working?

Answer 1

Your problem relates to capturing groups. If you want to have non-capturing alternation use the regex \\d+:\\d+ (?:am|pm) .

Answer 2

You probably don't even need regular expressions to split this particular string. If applicable, you can use the regular str.split() :

>>> s = "11:30 am - 2:20 pm"
>>> s.split(" - ")
['11:30 am', '2:20 pm']

This, of course, does not enforce items to be "time"-like strings.

Answer 3

Quoting docs ( emphasis mine ):

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

You may use re.finditer :

seq = [m.string[m.start():m.end()] for m in re.finditer("\d+:\d+ (am|pm)", "11:30 am - 2:20 pm")]
# ['11:30 am', '2:20 pm']

Regex ('foo'|'bar') notation

Question

3 answers

solution1
4 ACCPTED 2016-09-05 16:56:07

solution2
1 2016-09-05 17:10:20

solution3
0 2016-09-05 17:02:34

Regex ('foo'|'bar') notation

Question

3 answers

solution1 4 ACCPTED 2016-09-05 16:56:07

solution2 1 2016-09-05 17:10:20

solution3 0 2016-09-05 17:02:34

solution1
4 ACCPTED 2016-09-05 16:56:07

solution2
1 2016-09-05 17:10:20

solution3
0 2016-09-05 17:02:34