[英]convert months and years into days in pandas
我有 dataframe 有幾個月和幾年我想把它轉換成天
Name details
prem 6 months probation included
shaves 3 years 6 months suspended
geroge 48 hours work time
julvie 4 years 20 days terms included
tiz 80 days work
lamp 44 days work
在這里,我想將3 years as 1095 days, 6 months as 186 days
,也可以包括閏年,並且我想刪除所有其他單詞,例如probation included, suspended
,我想在新列中獲得所有結果。
預期結果:
Name details Time
prem 6 months probation included 186 days
shaves 3 years 6 months suspended 1181 days
geroge 48 hours work time 48 hours
julvie 4 years 20 days terms included 1480 days
tiz 80 days work 80 days
lamp 44 days work 44 days
使用Series.str.extract
獲取數字中的年份和 monts,然后乘以標量,因為沒有指定開始日期(應該更精確,例如year=365.2564days
)由Series.map
,最后添加單位numpy.where
中的條件:
d = {'months': 31, 'years':365, 'hours':1, 'days':1}
df1 = df['details'].str.extract('(\d+)\s+(years|months|hours|days)', expand=True)
df['Time'] = df1[0].astype(float).mul(df1[1].map(d)).astype('Int64').astype(str)
df['Unit'] = np.where(df1[1].isin(['years','months', 'days']), ' days', ' ' + df1[1])
df['Time'] += df.pop('Unit')
print (df)
Name details Time
0 prem 6 months probation included 186 days
1 shaves 3 years suspended 1095 days
2 geroge 48 hours work time 48 hours
3 julvie 4 years terms included 1460 days
4 tiz 80 days work 80 days
5 lamp 44 days work 44 days
編輯:如果可能的話,您可以使用多個單位:
#specified dictionary for extract to days
d = {'months': 31, 'years':365, 'days':1}
#extract anf multiple by dictionary
out = {k: df['details'].str.extract(rf'(\d+)\s+{k}', expand=False).astype(float).mul(d[k])
for k, v in d.items()}
#join together, sum and convert to days with replace 0 days
days = pd.concat(out, axis=1).sum(axis=1).astype(int).astype('str').add(' days').replace('0 days','')
#extract hours
hours = df['details'].str.extract(r'(\d+\s+hours)', expand=False).radd(' ').fillna('')
#join together
df['Time'] = days + hours
print (df)
Name details Time
0 john 2 years 1 months 10 days 15 hours work time 771 days 15 hours
1 prem 6 months probation included 186 days
2 shaves 3 years 6 months suspended 1281 days
3 geroge 48 hours work time 48 hours
4 julvie 4 years 20 days terms included 1480 days
5 tiz 80 days work 80 days
6 lamp 44 days work 44 days
# extract the date
date_cols = ['years', 'months', 'days', 'hours']
for col in date_cols:
df[col] = df.details.str.extract(f'(\d+)\s+{col}').fillna('0')
# convert to int
df[date_cols] = df[date_cols].astype(int)
days = df['years'] * 365 + df['months'] * 31 + df['days']
hours = df['hours']
# convert to string
days = days.astype('str') + ' days'
days[days == '0 days'] = ''
hours = hours.astype('str') + ' hours'
hours[hours == '0 hours'] = ''
df['tag'] = days + ' ' + hours
print(df)
0 Name details years months days hours \
1 prem 6 months probation included 0 6 0 0
2 shaves 3 years 6 months suspended 3 6 0 0
3 geroge 48 hours work time 0 0 0 48
4 julvie 4 years 20 days terms included 4 0 20 0
5 tiz 80 days work 0 0 80 0
6 lamp 44 days work 0 0 44 0
0 tag
1 186 days
2 1281 days
3 48 hours
4 1480 days
5 80 days
6 44 days
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.