簡體   English   中英

從帶有文本的文本文件轉換 Python 中的日期

[英]Convert dates in Python from a text file with text

我有一個文本文件(帶有utf-8文本),其中包含大量文本和日期,格式為dd/mm/yyyy ,年份從 1970 年到 2022 年不等。我想閱讀此文件並將日期轉換為yyyy-mm-dd格式, 同時保持所有文本不變。 你知道如何用 Python 做到這一點嗎? 或者我不介意使用其他工具(例如awksed )但只要文件的 rest 不會受到影響。

或者,我還想搜索日期或月份中沒有前導零的日期,並將它們也轉換。 但首先我想顯示它們(我不確定是否有這樣的日期)。

重要的是不要轉換其他字符串,因此如果“年份”不是從 1970 年到 2022 年,請不要轉換字符串。

我寫了這個程序但它需要調試,我不知道如何正確編寫_repl function。

import re
import io


def _repl(s):
    x = s.split("/")
    if ((len(x) == 3) and (0 < int(x[0]) <= 31) and (0 < int(x[1]) <= 12) and (1970 <= int(x[2]) <= 2022)):
        return "{:04d}-{:02d}-{:02d}".format(int(x[2]), int(x[1]), int(x[0]))
    return x


with io.open("1.txt", mode="r", encoding="utf-8") as f:
    b = f.readlines()

c = list()
for line in b:
    _line = ""
    while (not (_line == line)):
        # _line = re.sub(pattern=r'([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})', repl=_repl, string=line)
        _line = re.sub(pattern=r'([0-9]{2})/([0-9]{2})/([0-9]{4})', repl=_repl, string=line)
    c.append(_line)

with io.open('2.txt', mode='w', encoding="utf-8") as f:
    for line in c:
        f.write("{}".format(line))

repl function 作為它的單個參數匹配 object,考慮以下簡單示例

import re

def repl(m):
    day, month, year = m.groups()
    return '-'.join([year, month, day]) if 1970 <= int(year) <= 2022 else m.group()

text1 = "Date 01/01/1901 and 01/01/3001 are outside range"
text2 = "Year 2000 should not be changed"
text3 = "Date 01/12/1970 shall be changed"
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text1))
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text2))
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text3))

output

Date 01/01/1901 and 01/01/3001 are outside range
Year 2000 should not be changed
Date 1970-12-01 shall be changed

說明:我使用參數解包來獲取日、月、年,然后我檢查年(作為數值)是否在 [1970,2022] 的范圍內,如果是,我創建- -sheared 年、月、日,否則我留下了什么按原樣匹配。

感謝@Daweo 的回答。 我只是對其進行了一些更改以接受日期,而日期或月份中沒有前導零。

這是我的程序:

import re
import io


def _repl(m):
    day, month, year = m.groups()
    if ((0 < int(day) <= 31) and (0 < int(month) <= 12) and (1970 <= int(year) <= 2022)):
        return "{:04d}-{:02d}-{:02d}".format(int(year), int(month), int(day))
    else:
        return m.group()


with io.open("1.txt", mode="r", encoding="utf-8") as f:
    b = f.readlines()

c = list()
for line in b:
    _line = re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=line)
    c.append(_line)

with io.open('2.txt', mode='w', encoding="utf-8") as f:
    for line in c:
        f.write("{}".format(line))

以下是一些測試文本:

text1 = "Date 01/01/1901 and 01/01/3001 are outside range"
text2 = "Year 2000 should not be changed"
text3 = "Date 01/12/1970 shall be changed"
text4 = "'Date' 99/99/1970 should not be changed"
text5 = "'Date' 13/13/1970 should not be changed"
text6 = "Date 1/2/1970 shall be changed"
text7 = "Dates 01/12/1972,04/11/2022 shall be changed"
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text1))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text2))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text3))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text4))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text5))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text6))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text7))

output 是:

Date 01/01/1901 and 01/01/3001 are outside range
Year 2000 should not be changed
Date 1970-12-01 shall be changed
'Date' 99/99/1970 should not be changed
'Date' 13/13/1970 should not be changed
Date 1970-02-01 shall be changed
Dates 1972-12-01,2022-11-04 shall be changed

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM