简体   繁体   中英

How to convert a date which maybe in any format to a specific format in python?

I want to convert a date into a specific 'yy-mm-dd' format. The input can be in any format, for example, it can be twenty-sixth May twenty twenty or 26-05-2020 or 26/05/2020 or May 26,2020 or May twenty-sixth etc. The output for all the above scenarios should be 2020-05-26

You will have to use regular expressions. I wrote a function below which does some of what you asked.

It covers:

  • dd-mm-yyyy
  • dd/mm/yyyy
  • mm-dd-yyyy
  • dd/mm/yyyy

It does not cover May 26,2020 or May twenty-sixth (hoping someone can help with that, because I did not have enough time) But I hope it's a reasonable start at least. You can build on that if you know how to use regex.

I dont know what your input format is. I have assumed it is a DataFrame where the date column has a consistent format. Otherwise this exercise is not possible because you could have something like "02-02-2020" which could mean "dd-mm-yyyy" or "mm-dd-yyyy".

This function checks the entire column, get the "max" date (which hopefully contains a day which is over 12) and identify the day and month columns. Then depending on what format the column is, it accordingly reformats it to "yyyy-mm-dd".

import re

def clean_date_format(date_col):

        # replace "-" with "/"
        new_date = (date_col.str.replace('-', '/')).tolist()

        # check if US format
        first2_regex = r'^(\d*)\/.*'
        second2_regex = r'^.*\/(\d*)\/.*'

        r1 = re.compile(first2_regex)
        r2 = re.compile(second2_regex)

        first2_list = set([re.search(r1, x).group(1) for x in new_date])
        second2_list = set([re.search(r2, x).group(1) for x in new_date])

        first2_condition =  max(list(map(int,first2_list))) <= 12 # i.e. first column contains month
        second2_condition =  max(list(map(int,second2_list))) > 12 # i.e. second column contains day

        US_FORMAT = first2_condition & second2_condition


        if US_FORMAT:
            new_date = pd.DataFrame([datetime.strptime(d, "%m/%d/%Y").strftime("%Y-%m-%d") for d in new_date])
        else:
            new_date = pd.DataFrame([datetime.strptime(d, "%d/%m/%Y").strftime("%Y-%m-%d") for d in new_date])
        return new_date

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM