[英]convert months and years into days in pandas
I have dataframe which has months and years i want to convert it into days我有 dataframe 有几个月和几年我想把它转换成天
Name details
prem 6 months probation included
shaves 3 years 6 months suspended
geroge 48 hours work time
julvie 4 years 20 days terms included
tiz 80 days work
lamp 44 days work
here i want to change 3 years as 1095 days, 6 months as 186 days
, leap year can also be included, and i want to remove all other words like probation included, suspended
, i want to get all the results in a new column.在这里,我想将
3 years as 1095 days, 6 months as 186 days
,也可以包括闰年,并且我想删除所有其他单词,例如probation included, suspended
,我想在新列中获得所有结果。
expected result:预期结果:
Name details Time
prem 6 months probation included 186 days
shaves 3 years 6 months suspended 1181 days
geroge 48 hours work time 48 hours
julvie 4 years 20 days terms included 1480 days
tiz 80 days work 80 days
lamp 44 days work 44 days
Use Series.str.extract
for get years and monts in numeric, then multiple by scalars, because is not specified date of start (it should be more precise, eg for year=365.2564days
) by Series.map
, and last add units by condition in numpy.where
:使用
Series.str.extract
获取数字中的年份和 monts,然后乘以标量,因为没有指定开始日期(应该更精确,例如year=365.2564days
)由Series.map
,最后添加单位numpy.where
中的条件:
d = {'months': 31, 'years':365, 'hours':1, 'days':1}
df1 = df['details'].str.extract('(\d+)\s+(years|months|hours|days)', expand=True)
df['Time'] = df1[0].astype(float).mul(df1[1].map(d)).astype('Int64').astype(str)
df['Unit'] = np.where(df1[1].isin(['years','months', 'days']), ' days', ' ' + df1[1])
df['Time'] += df.pop('Unit')
print (df)
Name details Time
0 prem 6 months probation included 186 days
1 shaves 3 years suspended 1095 days
2 geroge 48 hours work time 48 hours
3 julvie 4 years terms included 1460 days
4 tiz 80 days work 80 days
5 lamp 44 days work 44 days
EDIT: If possible multiple units you can use:编辑:如果可能的话,您可以使用多个单位:
#specified dictionary for extract to days
d = {'months': 31, 'years':365, 'days':1}
#extract anf multiple by dictionary
out = {k: df['details'].str.extract(rf'(\d+)\s+{k}', expand=False).astype(float).mul(d[k])
for k, v in d.items()}
#join together, sum and convert to days with replace 0 days
days = pd.concat(out, axis=1).sum(axis=1).astype(int).astype('str').add(' days').replace('0 days','')
#extract hours
hours = df['details'].str.extract(r'(\d+\s+hours)', expand=False).radd(' ').fillna('')
#join together
df['Time'] = days + hours
print (df)
Name details Time
0 john 2 years 1 months 10 days 15 hours work time 771 days 15 hours
1 prem 6 months probation included 186 days
2 shaves 3 years 6 months suspended 1281 days
3 geroge 48 hours work time 48 hours
4 julvie 4 years 20 days terms included 1480 days
5 tiz 80 days work 80 days
6 lamp 44 days work 44 days
# extract the date
date_cols = ['years', 'months', 'days', 'hours']
for col in date_cols:
df[col] = df.details.str.extract(f'(\d+)\s+{col}').fillna('0')
# convert to int
df[date_cols] = df[date_cols].astype(int)
days = df['years'] * 365 + df['months'] * 31 + df['days']
hours = df['hours']
# convert to string
days = days.astype('str') + ' days'
days[days == '0 days'] = ''
hours = hours.astype('str') + ' hours'
hours[hours == '0 hours'] = ''
df['tag'] = days + ' ' + hours
print(df)
0 Name details years months days hours \
1 prem 6 months probation included 0 6 0 0
2 shaves 3 years 6 months suspended 3 6 0 0
3 geroge 48 hours work time 0 0 0 48
4 julvie 4 years 20 days terms included 4 0 20 0
5 tiz 80 days work 0 0 80 0
6 lamp 44 days work 0 0 44 0
0 tag
1 186 days
2 1281 days
3 48 hours
4 1480 days
5 80 days
6 44 days
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.