[英]Convert dates in Python from a text file with text
我有一個文本文件(帶有utf-8
文本),其中包含大量文本和日期,格式為dd/mm/yyyy
,年份從 1970 年到 2022 年不等。我想閱讀此文件並將日期轉換為yyyy-mm-dd
格式, 同時保持所有文本不變。 你知道如何用 Python 做到這一點嗎? 或者我不介意使用其他工具(例如awk
、 sed
)但只要文件的 rest 不會受到影響。
或者,我還想搜索日期或月份中沒有前導零的日期,並將它們也轉換。 但首先我想顯示它們(我不確定是否有這樣的日期)。
重要的是不要轉換其他字符串,因此如果“年份”不是從 1970 年到 2022 年,請不要轉換字符串。
我寫了這個程序但它需要調試,我不知道如何正確編寫_repl
function。
import re
import io
def _repl(s):
x = s.split("/")
if ((len(x) == 3) and (0 < int(x[0]) <= 31) and (0 < int(x[1]) <= 12) and (1970 <= int(x[2]) <= 2022)):
return "{:04d}-{:02d}-{:02d}".format(int(x[2]), int(x[1]), int(x[0]))
return x
with io.open("1.txt", mode="r", encoding="utf-8") as f:
b = f.readlines()
c = list()
for line in b:
_line = ""
while (not (_line == line)):
# _line = re.sub(pattern=r'([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})', repl=_repl, string=line)
_line = re.sub(pattern=r'([0-9]{2})/([0-9]{2})/([0-9]{4})', repl=_repl, string=line)
c.append(_line)
with io.open('2.txt', mode='w', encoding="utf-8") as f:
for line in c:
f.write("{}".format(line))
repl
function 作為它的單個參數匹配 object,考慮以下簡單示例
import re
def repl(m):
day, month, year = m.groups()
return '-'.join([year, month, day]) if 1970 <= int(year) <= 2022 else m.group()
text1 = "Date 01/01/1901 and 01/01/3001 are outside range"
text2 = "Year 2000 should not be changed"
text3 = "Date 01/12/1970 shall be changed"
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text1))
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text2))
print(re.sub(r'(\d{2})/(\d{2})/(\d{4})', repl, text3))
output
Date 01/01/1901 and 01/01/3001 are outside range
Year 2000 should not be changed
Date 1970-12-01 shall be changed
說明:我使用參數解包來獲取日、月、年,然后我檢查年(作為數值)是否在 [1970,2022] 的范圍內,如果是,我創建-
-sheared 年、月、日,否則我留下了什么按原樣匹配。
感謝@Daweo 的回答。 我只是對其進行了一些更改以接受日期,而日期或月份中沒有前導零。
這是我的程序:
import re
import io
def _repl(m):
day, month, year = m.groups()
if ((0 < int(day) <= 31) and (0 < int(month) <= 12) and (1970 <= int(year) <= 2022)):
return "{:04d}-{:02d}-{:02d}".format(int(year), int(month), int(day))
else:
return m.group()
with io.open("1.txt", mode="r", encoding="utf-8") as f:
b = f.readlines()
c = list()
for line in b:
_line = re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=line)
c.append(_line)
with io.open('2.txt', mode='w', encoding="utf-8") as f:
for line in c:
f.write("{}".format(line))
以下是一些測試文本:
text1 = "Date 01/01/1901 and 01/01/3001 are outside range"
text2 = "Year 2000 should not be changed"
text3 = "Date 01/12/1970 shall be changed"
text4 = "'Date' 99/99/1970 should not be changed"
text5 = "'Date' 13/13/1970 should not be changed"
text6 = "Date 1/2/1970 shall be changed"
text7 = "Dates 01/12/1972,04/11/2022 shall be changed"
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text1))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text2))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text3))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text4))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text5))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text6))
print(re.sub(pattern=r'(\d{1,2})/(\d{1,2})/(\d{4})', repl=_repl, string=text7))
output 是:
Date 01/01/1901 and 01/01/3001 are outside range
Year 2000 should not be changed
Date 1970-12-01 shall be changed
'Date' 99/99/1970 should not be changed
'Date' 13/13/1970 should not be changed
Date 1970-02-01 shall be changed
Dates 1972-12-01,2022-11-04 shall be changed
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.