简体   繁体   中英

How to extract date from filename in python?

I need to extract the event date written on the filename to be in a new column called event_date, I am assumed I can use regex but I still do not get the exact formula to implement.

The filename is written below

file_name = X-Y Cable Installment Monitoring (10-7-20).xlsx

The (10-7-20) is in mm-dd-yy format.

I expect the date would result df['event_date'] = 2020-10-07

How should I write my script to get the correct date from the filename.

Thanks in advance.

use str.rsplit() with datetime module -

Steps -

  1. extract date
  2. convert it into the required datetime format.
from datetime import datetime
file_name = 'X-Y Cable Installment Monitoring (10-7-20).xlsx'
date = file_name.rsplit('(')[1].rsplit(')')[0] # '10-7-20'
date  = datetime.strptime(date, "%m-%d-%y").strftime('%Y-%m-%d') # '2020-10-07'

Or via regex -

import re
regex = re.compile(r"(\d{1,2}-\d{1,2}-\d{2})") # pattern to capture date
matchArray = regex.findall(file_name)
date = matchArray[0]
date  = datetime.strptime(date, "%m-%d-%y").strftime('%Y-%m-%d')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM