简体   繁体   English

Awk - 仅从文本文件中过滤具有特定格式的日期

[英]Awk - filter only dates with certain format from text file

I have a .txt file with many lines of text on macOS. 我有一个.txt文件,在macOS上有很多行文本。 I would like to filter only dates and have them saved in order of appearance line by line in a new text file. 我想只过滤日期,并按照新文本文件中逐行显示的顺序保存日期。

I am, however, not interested in all dates, only in those who are complete, looking like 02/03/2019, and those where the number of days is below 13, ie 01...12. 但是,我对所有日期都不感兴趣,只有那些完整的,看起来像02/03/2019,以及天数低于13的那些日期,即01 ... 12。

Then, I would like to have those dates removed where the number for the day and month are the same like 01/01/2019 and 02/02/2019 etc. 然后,我希望删除这些日期,其中日期和月份的数字与01/01/2019和02/02/2019等相同。

How can I achieve this with awk or similar software in bash? 如何使用bash中的awk或类似软件实现此目的?

If perl is a choice: 如果perl是一个选择:

perl -ne 'print if m:(\\d\\d)/(\\d\\d)/(\\d\\d\\d\\d): && $1 < 13 && $1 != $2' dates.txt >newdates.txt

this assumes this format /dd/mm/yyyy 假设这种格式/dd/mm/yyyy

Note that I am using a m: : notation instead of the usual / / for regex matching. 请注意,我使用m: : notation而不是通常的/ /用于正则表达式匹配。 Thus I do not need to escape the / slashes in the date. 因此,我不需要在日期中转义/斜杠。

Deleting Dates Inside a Text File 删除文本文件中的日期

The following command will delete all dates of the form aa/bb/cccc where aa = bb < 13. The original file will be copied to yourFile.txt.bak as a backup and the new text with deleted dates will overwrite the old file. 以下命令将删除表单的所有日期✱a aa/bb/cccc ,其中aa = bb <13。原始文件将作为备份复制到yourFile.txt.bak ,带有已删除日期的新文本将覆盖旧文件。

sed -E -i.bak 's:\b(0[0-9]|1[0-2])/\1/[0-9]{4}\b::g' yourFile.txt

If you want to insert something instead of just deleting the dates you can do so by writing the replacement between the two :: . 如果你想插入一些东西,而不是只删除日期,你可以通过写两个::之间的替换。 For instance sed … 's:…:deleted date:/g' … will replace each matching date with the text deleted date . 例如sed … 's:…:deleted date:/g' …将用文本deleted date替换每个匹配deleted date

✱ Note that it doesn't matter for your criterion whether the date format is dd/mm/yyyy or mm/dd/yyyy since your are only interested in dates where dd and mm are equal. ✱请注意,您的标准与日期格式是dd/mm/yyyy还是mm/dd/yyyy因为您只对ddmm相等的日期感兴趣。

Extracting Specific Dates From A Text File 从文本文件中提取特定日期

If you do not want to delete, but only extract specific dates as mentioned in your comment, you can use the following command. 如果您不想删除,但只提取注释中提到的特定日期,则可以使用以下命令。

grep -Eo '\b([0-9]{2}/){2}[0-9]{4}\b' yourFile.txt | awk -F/ '$1<13 && $1!=$2'

This will extract all dates in dd/mm/yyyy (!) format where mmdd < 13. The dates are printed in order of appearance on stdin. 这将以dd/mm/yyyy (!)格式提取所有日期,其中mmdd <13。日期按stdin上的出现顺序打印。 If you want to save them to a file append > yourOutputFile.txt to the end of the command. 如果要将它们保存到文件,请将> yourOutputFile.txt附加到命令的末尾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM