[英]How can i extract month and year from a string in python?
Input text :输入文字:
text = "Wipro Limited | Hyderabad, IN Dec 2017 – Present
Project Analyst
Infosys | Delhi, IN Apr 2017 – Nov 2017
Software Developer
HCL Technologies | Hyderabad, IN Jun 2016 – Mar 2017
Software Engineer
"
I have written a code for this but it displays in list for each extracted word and unable to do anything of it.我为此编写了一个代码,但它显示在每个提取的单词的列表中并且无法执行任何操作。
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s+\–\s+(?P<month1>[a-zA-Z]+)\s+(?P<year1>\d{4})')
mat = re.findall(regex, text)
mat
Check out the code: https://regex101.com/r/mMlgYp/1 .查看代码: https://regex101.com/r/mMlgYp/1 。 I want the output like below to preview the dates and make difference of it then calculate total experience: Here Present or Till date should consider current month and year.
我希望像下面这样的 output 预览日期并加以区别,然后计算总经验:这里现在或截止日期应考虑当前月份和年份。
import time
Present = time.strftime("%m-%Y")
Present
# output: '05-2020'
#Desired output
Extracted dates:
[('Dec 2017 - Present'),
('Apr 2017 - Nov 2017'),
('Jun 2016 - Mar 2017')]# and so on ...should display all the search results
First experience: 1.9 years
second experience: 8 months
third experience: 7 months
# and so on ...should display all the search results
Total experience: 3.4 years
Please help me with this I'm new to programming lang and NLP, regex stuff.请帮我解决这个问题,我是编程 lang 和 NLP,正则表达式的新手。
You probably ultimately want this in a dataframe since you tagged it pandas (see Andrej's answer ), but either way, you can parse dates from the string with the interpolated:您可能最终希望在 dataframe 中使用它,因为您将其标记为 pandas(请参阅Andrej 的答案),但无论哪种方式,您都可以使用插值从字符串中解析日期:
fr"(?i)((?:{months}) *\d{{4}}) *(?:-|–) *(present|(?:{months}) *\d{{4}})"
Where {months}
is an alternating group of all possible month names and abbreviations.其中
{months}
是所有可能的月份名称和缩写的交替组。
import calendar
import re
from datetime import datetime
from dateutil.relativedelta import relativedelta
text = """Wipro Limited | Hyderabad, IN Dec 2017 – Present
Project Analyst
Infosys | Delhi, IN Apr 2017 – Nov 2017
Software Developer
HCL Technologies | Hyderabad, IN Jun 2016 – Mar 2017
Software Engineer
"""
def parse_date(x, fmts=("%b %Y", "%B %Y")):
for fmt in fmts:
try:
return datetime.strptime(x, fmt)
except ValueError:
pass
months = "|".join(calendar.month_abbr[1:] + calendar.month_name[1:])
pattern = fr"(?i)((?:{months}) *\d{{4}}) *(?:-|–) *(present|(?:{months}) *\d{{4}})"
total_experience = None
for start, end in re.findall(pattern, text):
if end.lower() == "present":
today = datetime.today()
end = f"{calendar.month_abbr[today.month]} {today.year}"
duration = relativedelta(parse_date(end), parse_date(start))
if total_experience:
total_experience += duration
else:
total_experience = duration
print(f"{start}-{end} ({duration.years} years, {duration.months} months)")
if total_experience:
print(f"total experience: {total_experience.years} years, {total_experience.months} months")
else:
print("couldn't parse text")
Output: Output:
Dec 2017-May 2020 (2 years, 5 months)
Apr 2017-Nov 2017 (0 years, 7 months)
Jun 2016-Mar 2017 (0 years, 9 months)
total experience: 3 years, 9 months
import re
import numpy as np
import pandas as pd
text = '''Wipro Limited | Hyderabad, IN Dec 2017 – Present
Project Analyst
Infosys | Delhi, IN Apr 2017 – Nov 2017
Software Developer
HCL Technologies | Hyderabad, IN Jun 2016 – Mar 2017
Software Engineer
'''
def pretty_format(monthts):
return f'{monthts/12:.1f} years' if monthts > 11 else f'{monthts:.1f} months'
data = []
for employer, d1, d2 in re.findall(r'(.*?)\s*\|.*([A-Z][a-z]{2} [12]\d{3}) – (?:([A-Z][a-z]{2} [12]\d{3})|Present)', text):
data.append({'Employer': employer, 'Begin': d1, 'End': d2 or np.nan})
df = pd.DataFrame(data)
df['Begin'] = pd.to_datetime(df['Begin'])
df['End'] = pd.to_datetime(df['End'])
df['Experience'] = ((df['End'].fillna(pd.to_datetime('now')) - df['Begin']) / np.timedelta64(1, 'M')).apply(pretty_format)
print(df)
total = np.sum(df['End'].fillna(pd.to_datetime('now')) - df['Begin']) / np.timedelta64(1, 'M')
print()
print(f'Total experience = {pretty_format(total)}')
Prints:印刷:
Employer Begin End Experience
0 Wipro Limited 2017-12-01 NaT 2.5 years
1 Infosys 2017-04-01 2017-11-01 7.0 months
2 HCL Technologies 2016-06-01 2017-03-01 9.0 months
Total experience = 3.8 years
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.