简体   繁体   中英

Regular expression match for multiple patterns

Needed help on regex to match multiple patterns but the code doesnt seem to be be working I want to extract the text matching the regex pattern for 'experience' in a resume

    regex1 = '(?P<fmonth>\w+.\d+)\s*(\D|to)\s*(?P<smonth>\w+.\d+|present)'
    regex2 = '(?P<day>\d{1,2})\s*(?P<tmonth>\w+.\d+)\s*(\D|-)\s*(?P<bmonth>\w+.\d+|present)'
    regex3 = '(0[1-9]|1[0-2])/?([0-9]{4})\s*(\D|-)\s*(0[1-9]|1[0-2])/?([0-9]{4})'
    regex4= '(\d{4}-\d{2})\s*(\D|-)\s*(\d{4}-\d{2}|present)'
    regexList = [regex1,regex2,regex3,regex4]
    for regex in regexList:
        # experience= re.findall(regex,line)
        experience = re.match(regex,line)
        exp_.append(experience)
        print(exp_)

But the match always returns none even though the date format matching in the resume is present

Sample Input:12/2020 - 04/2021

Desired Output: Need to calculate total experience using the above date range in a resume

Despite the fact that the code in the question is not executable with some missing parts at the time of writing this answer, I tried something to help understand the problem.

I think you can achieve what you want by carefully creating capturing groups. Based on the simple input you provided Sample Input:12/2020 - 04/2021 , I came up with this solution.

I have created 2 regexes in this example. They have a similar pattern up to capturing group 3. regex2 has a slightly different ending to capture a word instead of numbers causing it to not have capturing groups 4 and 5.

group1 : captures start month

group2 : captures start year

group3 : captures full end date with regex1 or word Present with regex2

gruop4 : captures end month if end date is not equal to word Present

group5 : captures end year if end date is not equal to word Present

Note that I have not handled all the exceptions that could occur with various inputs.

import re
from datetime import datetime

from dateutil import relativedelta

line = """
12/2020 - 04/2021
05/2021 - Present
"""

regex1 = '(\d{2})\/(\d{4})\s-\s((\d{2})\/(\d{4}))'
regex2 = '(\d{2})\/(\d{4})\s-\s(Present)'
regexList = [regex1, regex2]


def diff_month(d1, d2):
    return (d1.year - d2.year) * 12 + d1.month - d2.month


exp_ = 0
for regex in regexList:
    for date_match in re.finditer(regex, line):
        start_month = int(date_match.group(1))
        start_year = int(date_match.group(2))
        end_month = None
        end_year = None
        if date_match.group(3) == "Present":
            today = datetime.today()
            end_month = today.month.real
            end_year = today.year.real
        else:
            end_month = int(date_match.group(4)) + 1  # +1 to get full month
            end_year = int(date_match.group(5))
        delta = relativedelta.relativedelta(datetime(end_year, end_month, 1), datetime(start_year, start_month, 1))
        delta_months = delta.months + (12 * delta.years)
        exp_ += delta_months

print("Total Experience = " + str(exp_ // 12) + " years " + str(exp_ % 12) + " months")

Result

Total Experience = 0 years 7 months

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM