简体   繁体   English

使用 awk 从日期列中减去固定天数并将其添加到新列

[英]Subtract fixed number of days from date column using awk and add it to new column

Let's assume that we have a file with the values as seen bellow:假设我们有一个文件,其值如下所示:

% head test.csv
20220601,A,B,1
20220530,A,B,1

And we want to add two new columns, one with the date minus 1 day and one with minus 7 days, resulting the following:我们想添加两列,一列的日期为负 1 天,一列为负 7 天,结果如下:

% head new_test.csv
20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1

The awk that was used to produce the above is:用于生成以上内容的 awk 是:

% awk 'BEGIN{FS=OFS=","} { a="date -d \"$(date -d \""$1"\") -7 days\" +'%Y%m%d'"; a | getline st; close(a);b="date -d \"$(date -d \""$1"\") -1 days\" +'%Y%m%d'"; b | getline cb; close(b);print $1","$2","$3","st","cb","$4}' test.csv > new_test.csv

But after applying the above in a large file with more than 100K lines it runs for 20 minutes, is there any way to optimize the awk?但是在超过 100K 行的大文件中应用上面的代码运行 20 分钟后,有什么方法可以优化 awk 吗?

One GNU awk approach:一种GNU awk方法:

awk '
BEGIN { FS=OFS=","
        secs_in_day = 60 * 60 * 24
      }
      { dt = mktime( substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " 12 0 0" )
        dt1 = strftime("%Y%m%d",dt -  secs_in_day      )
        dt7 = strftime("%Y%m%d",dt - (secs_in_day * 7) )
        print $1,$2,$3,dt7,dt1,$4
      }
' test.csv

This generates:这会产生:

20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1

NOTES:笔记:

  • requires GNU awk for the mktime() and strftime() functions; mktime()strftime()函数需要GNU awk see GNU awk time functions for more details有关详细信息,请参阅GNU awk 时间函数
  • other flavors of awk may have similar functions, ymmv awk的其他口味可能有类似的功能,ymmv

You can try using function calls, it is faster than calling the.你可以试试用function打电话,比打电话快。

awk -F, '
    function cmd1(date){
        a="date -d \"$(date -d \""date"\") -1days\" +'%Y%m%d'"
        a | getline st
        return st
        close(a)
    }
    function cmd2(date){
        b="date -d \"$(date -d \""date"\") -7days\" +'%Y%m%d'"
        b | getline cm
        return cm
        close(b)
    }
    {
        $5=cmd1($1)
        $6=cmd2($1)
        print $1","$2","$3","$5","$6","$4
    }' OFS=, test > newFileTest

I executed this against a file with 20000 records in seconds, compared to the original awk which took around 5 minutes.我在几秒钟内针对一个包含 20000 条记录的文件执行了此操作,而原来的 awk 花费了大约 5 分钟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM