Python regex not capturing groups properly

Question

I have the following regex (?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+)) .

Input text examples:

RE:11567 Miss Jane Doe 12345678
Reference: Miss Jane Doe 12345678
RE:J123 Miss Jane Doe 12345678
RE:J123 Miss Jane Doe 12345678 Reference: Test Company

Sample Code:

import re

pattern = re.compile('(?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+))')
result = pattern.findall('RE:11693 Miss Jane Doe 12345678')

For all 4 I expect the output ('Miss Jane Doe', 'Miss', 'Jane', 'Doe') . However in 4th text example I get [('Miss Jane Doe', 'Miss', 'Jane', 'Doe'), (' Test Company', '', 'Test', 'Company')]

How can I get the correct output

Answer 1

Just add ^ to the start of the regex to only match at the start. This makes it ^(?:RE:\w+|Reference:)\s*((Mr|Mrs|Ms|Miss)?\s+([\w-]+)\s(\w+)) .

Python regex not capturing groups properly

Question

1 answers

solution1
1 ACCPTED 2022-12-21 03:36:48

Python regex not capturing groups properly

Question

1 answers

solution1 1 ACCPTED 2022-12-21 03:36:48

solution1
1 ACCPTED 2022-12-21 03:36:48