簡體   English   中英

有沒有更好的方法可以在python中進行日期解析呢?

[英]Is there a better way to do date parsing in python that this?

我正在嘗試將自由格式的日期字符串解析為有意義的日期。 到目前為止,我已經提出了這個功能:

"""Parse raw date string into YYYY-MM-DD"""
def __parseDate(self, rawDate):
    if len(rawDate) == 0:
        return u""
    if "{{Birth year and age|" in rawDate:
        rawDate = rawDate.replace("{{","").replace("}}","")
        year = rawDate.split("|")[1].strip()
        return year + "-01-01"
    elif "{{Birth date and age|" in rawDate:
        rawDate = rawDate.replace("{{","").replace("}}","")
        year = rawDate.split("|")[1].strip()
        month = rawDate.split("|")[2].strip()
        day = rawDate.split("|")[3].strip()
        if len(month) == 1:
            month = "0" + month
        if len(day) == 1:
            day = "0" + day
        return year + "-" + month + "-" + day
    elif "{{" in rawDate:
        self.__log(u"XXX Date parse error (unknown template): " + rawDate)
        return u"1770-01-01"
    elif re.match("([a-zA-Z]* [0-9][0-9]?, [0-9][0-9][0-9][0-9])", rawDate):
        match = re.findall("([a-zA-Z]* [0-9][0-9]?, [0-9][0-9][0-9][0-9])", rawDate)[0]
        parts = match.replace(",","").split(" ")
        year = parts[2].strip()
        month = parts[0].replace(".","").strip()
        day = parts[1].strip()
        tryAgain = False
        try:
            month = str(strptime(month,'%B').tm_mon)
        except:
            tryAgain = True
            pass
        try:
            if tryAgain:
                month = str(strptime(month,'%b').tm_mon)
        except:
            self.__log(u"XXX Date parse error: " + rawDate)
            return u"1770-01-01"
            pass

        if len(month) == 1:
            month = "0" + month
        if len(day) == 1:
            day = "0" + day
        return year + "-" + month + "-" + day
    elif re.match("[0-9][0-9][0-9][0-9]-[0-9][0-9]?-[0-9][0-9]?", rawDate):
        parts = rawDate.split("-")
        year = parts[0].strip()
        month = parts[1].strip()
        day = parts[2].strip()
        if len(month) == 1:
            month = "0" + month
        if len(day) == 1:
            day = "0" + day
        return year + "-" + month + "-" + day
    else:
        self.__log(u"XXX Date parse error: " + rawDate)
        return u"1770-01-01"

我是在正確的軌道上還是有更好的路要走?

通過自由格式字符串更新我的意思是這來自維基頁面,尤其是個人數據模板。 該模板中的日期字段是自由格式,因為人類已經在其中鍵入了一些內容。 通常,這是任何格式的日期,或者它本身是另一個描述日期的Wiki模板。 以下是該領域的一些示例:

{{Birth year and age|1933}}
August 23, 1967
1990-01-29
23 August 1967
1999
a;lsdfhals;djkfh

終極可能是parsedatetime

另一個選擇是dateutil

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM