简体   繁体   中英

Extracting groups in a regex match

I have a set of inputs. I am trying to write a regex to match the following pattern in the input:

Day at Time on location

Example input:

Today at 12:30 PM on Sam's living room

The bolded part of the text varies in each input.

I wrote the following regex:

import regex as re

input_example = "Today at 12:30 PM on Rakesh's Echo"
regexp_1 = re.compile(r'(\w+) at (\d+):(\d+) (\w+) on (\w+)')
re_match = regexp_1.match(input_example)

Which works, I am matching the correct patterns. I am now trying to extract groups from within the pattern.

My desired output is:

re_match.group(1)
>> "Today"
re_match.group(2)
>> "12:30 PM"
re_match.group(3)
>> "Sam's living room"

However, my current regular expression match does not give me this output. What is the correct regex that will give me the above outputs?

You are pretty close. You just want to adjust your capture groups a bit to look like...

re.compile(r"(\\w+) at (\\d+:\\d+ \\w+) on (.+)")

Note the second capture group will now match the full hour:minute period-of-day . The final capture group (\\w+) will match az , AZ , 0-9 and _ , but not ' causing you to only capture a small bit of the description. The change to .+ allows it to match any character. If you know only a few characters outside of \\w need to be matched you can do [\\w']+ with whatever other characters you need included.

A good tool to play with and test your regex is https://regex101.com/ just make sure you select the python language.

You can make nested groups, but in that way it would be not very readable, because you have to compute the exact number of the group and then you will forget what exactly means that number.

It's better to use named groups. This is copied from the REPL:

>>> import re
... 
... input_example = "Today at 12:30 PM on Rakesh's Echo"
... regexp_1 = re.compile(r'(?P<day>\w+) at (?P<time>(\d+):(\d+) (\w+)) on (?P<place>\w+)')
... re_match = regexp_1.match(input_example)
>>> list(re_match.groups())
['Today', '12:30 PM', '12', '30', 'PM', 'Rakesh']
>>> re_match.group('day')
'Today'
>>> re_match.group('time')
'12:30 PM'
>>> re_match.group('place')
'Rakesh'

I think you want re.compile(r'(\\w+) at (\\d+:\\d+ \\w+) on (.+)') instead.

Your second group needs to capture the whole time (two numbers and a word) and your third group needs to accept more than just \\w if you want to get apostrophes, etc. I'm suggesting .+ which will just get everything to the end of the line.

I've tried this and get:

Today

12:30 PM

Rakesh's Echo

Anything you have in parentheses () will be a capture group.

Try this: (\\w*) at (\\d+:\\d+ \\w+) on (.*) .

So then,

1st group --> \w*

2nd group --> \d+:\d+ \w+

3rd group --> .*

Which gives you:

1st group --> Today

2nd group --> 12:30 PM

3rd group --> Rakesh's Echo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM