简体   繁体   中英

regex to match and not capture some part of the string

I am trying to capture dates that can be in a string like this

'30 jan and 6 apr and 12 oct 2022'

I am using python regex module (its the same as re but has 'overlapped' option).I need to have the end result as this list

['30 jan 2022', '6 apr 2022', '12 oct 2022']

so far with this expression

regex.findall(r'(?:\d\d | \d )(?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec)(?:.*)20(?:\d\d)', d, overlapped=True)

I am getting

['30 jan and 6 apr and 12 oct 2022', ' 6 apr and 12 oct 2022', '12 oct 2022']

Thanks in advance.

You might use a list comprehension and 2 capture groups:

\b(\d+ (?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec))(?=.*\b(20\d\d))\b

See a regex demo and a Python demo .

import re

pattern = r"\b(\d+ (?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec))(?=.*\b(20\d\d))\b"
s = r"30 jan and 6 apr and 12 oct 2022"

res = [' '.join(s) for s in re.findall(pattern, s)]
print(res)

Output

['30 jan 2022', '6 ap 2022', '12 oct 2022']

Note that (?:.*) and (?:\d\d) do not need the non capture group, as the group by itself has no purpose in the pattern.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM