使用 awk 从日期列中减去固定天数并将其添加到新列

Question

Let's assume that we have a file with the values as seen bellow:假设我们有一个文件，其值如下所示：

% head test.csv
20220601,A,B,1
20220530,A,B,1

And we want to add two new columns, one with the date minus 1 day and one with minus 7 days, resulting the following:我们想添加两列，一列的日期为负 1 天，一列为负 7 天，结果如下：

% head new_test.csv
20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1

The awk that was used to produce the above is:用于生成以上内容的 awk 是：

% awk 'BEGIN{FS=OFS=","} { a="date -d \"$(date -d \""$1"\") -7 days\" +'%Y%m%d'"; a | getline st; close(a);b="date -d \"$(date -d \""$1"\") -1 days\" +'%Y%m%d'"; b | getline cb; close(b);print $1","$2","$3","st","cb","$4}' test.csv > new_test.csv

But after applying the above in a large file with more than 100K lines it runs for 20 minutes, is there any way to optimize the awk?但是在超过 100K 行的大文件中应用上面的代码运行 20 分钟后，有什么方法可以优化 awk 吗？

Answer 1

One GNU awk approach:一种GNU awk方法：

awk '
BEGIN { FS=OFS=","
        secs_in_day = 60 * 60 * 24
      }
      { dt = mktime( substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " 12 0 0" )
        dt1 = strftime("%Y%m%d",dt -  secs_in_day      )
        dt7 = strftime("%Y%m%d",dt - (secs_in_day * 7) )
        print $1,$2,$3,dt7,dt1,$4
      }
' test.csv

This generates:这会产生：

20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1

NOTES:笔记：

requires GNU awk for the mktime() and strftime() functions; mktime()和strftime()函数需要GNU awk ； see GNU awk time functions for more details有关详细信息，请参阅GNU awk 时间函数
other flavors of awk may have similar functions, ymmv awk的其他口味可能有类似的功能，ymmv

Answer 2

You can try using function calls, it is faster than calling the.你可以试试用function打电话，比打电话快。

awk -F, '
    function cmd1(date){
        a="date -d \"$(date -d \""date"\") -1days\" +'%Y%m%d'"
        a | getline st
        return st
        close(a)
    }
    function cmd2(date){
        b="date -d \"$(date -d \""date"\") -7days\" +'%Y%m%d'"
        b | getline cm
        return cm
        close(b)
    }
    {
        $5=cmd1($1)
        $6=cmd2($1)
        print $1","$2","$3","$5","$6","$4
    }' OFS=, test > newFileTest

I executed this against a file with 20000 records in seconds, compared to the original awk which took around 5 minutes.我在几秒钟内针对一个包含 20000 条记录的文件执行了此操作，而原来的 awk 花费了大约 5 分钟。

使用 awk 从日期列中减去固定天数并将其添加到新列

问题描述

2 个解决方案

解决方案1
4 2023-01-09 16:02:14

解决方案2
1 已采纳 2023-01-10 09:21:42

使用 awk 从日期列中减去固定天数并将其添加到新列

问题描述

2 个解决方案

解决方案1 4 2023-01-09 16:02:14

解决方案2 1 已采纳 2023-01-10 09:21:42

解决方案1
4 2023-01-09 16:02:14

解决方案2
1 已采纳 2023-01-10 09:21:42