[英]Convert dates in Python from a text file with text
I have a text file (with utf-8
text) with lots of text and dates in the format dd/mm/yyyy
, with years ranging from 1970 to 2022. I want to read this file and convert the dates to yyyy-mm-dd
format, while keeping all the text as it is.我有一个文本文件(带有
utf-8
文本),其中包含大量文本和日期,格式为dd/mm/yyyy
,年份从 1970 年到 2022 年不等。我想阅读此文件并将日期转换为yyyy-mm-dd
格式, 同时保持所有文本不变。 Do you know how to do it with Python?你知道如何用 Python 做到这一点吗? Or I don't mind using another tool (such as
awk
, sed
) but as long as the rest of the file will not be affected.或者我不介意使用其他工具(例如
awk
、 sed
)但只要文件的 rest 不会受到影响。
Optionally, I want to search also for dates without leading zeros in the day or month, and convert them too.或者,我还想搜索日期或月份中没有前导零的日期,并将它们也转换。 But first I want to display them (I'm not sure if there are such dates).
但首先我想显示它们(我不确定是否有这样的日期)。
It's important not to convert other strings, so if the "year" is not from 1970 to 2022, don't convert the string.重要的是不要转换其他字符串,因此如果“年份”不是从 1970 年到 2022 年,请不要转换字符串。
I wrote this program but it needs debugging, I don't know how to write the _repl
function properly.我写了这个程序但它需要调试,我不知道如何正确编写
_repl
function。
import re
import io
def _repl(s):
x = s.split("/")
if ((len(x) == 3) and (0 < int(x[0]) <= 31) and (0 < int(x[1]) <= 12) and (1970 <= int(x[2]) <= 2022)):
return "{:04d}-{:02d}-{:02d}".format(int(x[2]), int(x[1]), int(x[0]))
return x
with io.open("1.txt", mode="r", encoding="utf-8") as f:
b = f.readlines()
c = list()
for line in b:
_line = ""
while (not (_line == line)):
# _line = re.sub(pattern=r'([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})', repl=_repl, string=line)
_line = re.sub(pattern=r'([0-9]{2})/([0-9]{2})/([0-9]{4})', repl=_repl, string=line)
c.append(_line)
with io.open('2.txt', mode='w', encoding="utf-8") as f:
for line in c:
f.write("{}".format(line))
repl
function gets as its' single argument match object, consider following simple example repl
function 作为它的单个参数匹配 object,考虑以下简单示例
import re
def repl(m):
day, month, year = m.groups()
return '-'.join([year, month, day]) if 1970 <= int(year) <= 2022 else m.group()
text1 = "Date 01/01/1901 and 01/01/3001 are outside range"
text2 = "Year 2000 should not be changed"
text3 = "Date 01/12/1970 shall be changed"
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text1))
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text2))
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text3))
output output
Date 01/01/1901 and 01/01/3001 are outside range
Year 2000 should not be changed
Date 1970-12-01 shall be changed
Explanation: I use argument unpacking to get day, month, year, then I check if year (as numerical value) is inside range of [1970,2022] if yes I do create -
-sheared year, month, day otherwise I left what was matched as-is.说明:我使用参数解包来获取日、月、年,然后我检查年(作为数值)是否在 [1970,2022] 的范围内,如果是,我创建
-
-sheared 年、月、日,否则我留下了什么按原样匹配。
Thanks to @Daweo for your answer.感谢@Daweo 的回答。 I just changed it a bit to accept also dates without leading zeros in the day or the month.
我只是对其进行了一些更改以接受日期,而日期或月份中没有前导零。
This is my program:这是我的程序:
import re
import io
def _repl(m):
day, month, year = m.groups()
if ((0 < int(day) <= 31) and (0 < int(month) <= 12) and (1970 <= int(year) <= 2022)):
return "{:04d}-{:02d}-{:02d}".format(int(year), int(month), int(day))
else:
return m.group()
with io.open("1.txt", mode="r", encoding="utf-8") as f:
b = f.readlines()
c = list()
for line in b:
_line = re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=line)
c.append(_line)
with io.open('2.txt', mode='w', encoding="utf-8") as f:
for line in c:
f.write("{}".format(line))
Here are some texts for testing:以下是一些测试文本:
text1 = "Date 01/01/1901 and 01/01/3001 are outside range"
text2 = "Year 2000 should not be changed"
text3 = "Date 01/12/1970 shall be changed"
text4 = "'Date' 99/99/1970 should not be changed"
text5 = "'Date' 13/13/1970 should not be changed"
text6 = "Date 1/2/1970 shall be changed"
text7 = "Dates 01/12/1972,04/11/2022 shall be changed"
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text1))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text2))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text3))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text4))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text5))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text6))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text7))
And the output is: output 是:
Date 01/01/1901 and 01/01/3001 are outside range
Year 2000 should not be changed
Date 1970-12-01 shall be changed
'Date' 99/99/1970 should not be changed
'Date' 13/13/1970 should not be changed
Date 1970-02-01 shall be changed
Dates 1972-12-01,2022-11-04 shall be changed
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.