简体   繁体   English

从带有文本的文本文件转换 Python 中的日期

[英]Convert dates in Python from a text file with text

I have a text file (with utf-8 text) with lots of text and dates in the format dd/mm/yyyy , with years ranging from 1970 to 2022. I want to read this file and convert the dates to yyyy-mm-dd format, while keeping all the text as it is.我有一个文本文件(带有utf-8文本),其中包含大量文本和日期,格式为dd/mm/yyyy ,年份从 1970 年到 2022 年不等。我想阅读此文件并将日期转换为yyyy-mm-dd格式, 同时保持所有文本不变。 Do you know how to do it with Python?你知道如何用 Python 做到这一点吗? Or I don't mind using another tool (such as awk , sed ) but as long as the rest of the file will not be affected.或者我不介意使用其他工具(例如awksed )但只要文件的 rest 不会受到影响。

Optionally, I want to search also for dates without leading zeros in the day or month, and convert them too.或者,我还想搜索日期或月份中没有前导零的日期,并将它们也转换。 But first I want to display them (I'm not sure if there are such dates).但首先我想显示它们(我不确定是否有这样的日期)。

It's important not to convert other strings, so if the "year" is not from 1970 to 2022, don't convert the string.重要的是不要转换其他字符串,因此如果“年份”不是从 1970 年到 2022 年,请不要转换字符串。

I wrote this program but it needs debugging, I don't know how to write the _repl function properly.我写了这个程序但它需要调试,我不知道如何正确编写_repl function。

import re
import io


def _repl(s):
    x = s.split("/")
    if ((len(x) == 3) and (0 < int(x[0]) <= 31) and (0 < int(x[1]) <= 12) and (1970 <= int(x[2]) <= 2022)):
        return "{:04d}-{:02d}-{:02d}".format(int(x[2]), int(x[1]), int(x[0]))
    return x


with io.open("1.txt", mode="r", encoding="utf-8") as f:
    b = f.readlines()

c = list()
for line in b:
    _line = ""
    while (not (_line == line)):
        # _line = re.sub(pattern=r'([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})', repl=_repl, string=line)
        _line = re.sub(pattern=r'([0-9]{2})/([0-9]{2})/([0-9]{4})', repl=_repl, string=line)
    c.append(_line)

with io.open('2.txt', mode='w', encoding="utf-8") as f:
    for line in c:
        f.write("{}".format(line))

repl function gets as its' single argument match object, consider following simple example repl function 作为它的单个参数匹配 object,考虑以下简单示例

import re

def repl(m):
    day, month, year = m.groups()
    return '-'.join([year, month, day]) if 1970 <= int(year) <= 2022 else m.group()

text1 = "Date 01/01/1901 and 01/01/3001 are outside range"
text2 = "Year 2000 should not be changed"
text3 = "Date 01/12/1970 shall be changed"
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text1))
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text2))
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text3))

output output

Date 01/01/1901 and 01/01/3001 are outside range
Year 2000 should not be changed
Date 1970-12-01 shall be changed

Explanation: I use argument unpacking to get day, month, year, then I check if year (as numerical value) is inside range of [1970,2022] if yes I do create - -sheared year, month, day otherwise I left what was matched as-is.说明:我使用参数解包来获取日、月、年,然后我检查年(作为数值)是否在 [1970,2022] 的范围内,如果是,我创建- -sheared 年、月、日,否则我留下了什么按原样匹配。

Thanks to @Daweo for your answer.感谢@Daweo 的回答。 I just changed it a bit to accept also dates without leading zeros in the day or the month.我只是对其进行了一些更改以接受日期,而日期或月份中没有前导零。

This is my program:这是我的程序:

import re
import io


def _repl(m):
    day, month, year = m.groups()
    if ((0 < int(day) <= 31) and (0 < int(month) <= 12) and (1970 <= int(year) <= 2022)):
        return "{:04d}-{:02d}-{:02d}".format(int(year), int(month), int(day))
    else:
        return m.group()


with io.open("1.txt", mode="r", encoding="utf-8") as f:
    b = f.readlines()

c = list()
for line in b:
    _line = re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=line)
    c.append(_line)

with io.open('2.txt', mode='w', encoding="utf-8") as f:
    for line in c:
        f.write("{}".format(line))

Here are some texts for testing:以下是一些测试文本:

text1 = "Date 01/01/1901 and 01/01/3001 are outside range"
text2 = "Year 2000 should not be changed"
text3 = "Date 01/12/1970 shall be changed"
text4 = "'Date' 99/99/1970 should not be changed"
text5 = "'Date' 13/13/1970 should not be changed"
text6 = "Date 1/2/1970 shall be changed"
text7 = "Dates 01/12/1972,04/11/2022 shall be changed"
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text1))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text2))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text3))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text4))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text5))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text6))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text7))

And the output is: output 是:

Date 01/01/1901 and 01/01/3001 are outside range
Year 2000 should not be changed
Date 1970-12-01 shall be changed
'Date' 99/99/1970 should not be changed
'Date' 13/13/1970 should not be changed
Date 1970-02-01 shall be changed
Dates 1972-12-01,2022-11-04 shall be changed

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM