[英]Subtract fixed number of days from date column using awk and add it to new column
Let's assume that we have a file with the values as seen bellow:假设我们有一个文件,其值如下所示:
% head test.csv
20220601,A,B,1
20220530,A,B,1
And we want to add two new columns, one with the date minus 1 day and one with minus 7 days, resulting the following:我们想添加两列,一列的日期为负 1 天,一列为负 7 天,结果如下:
% head new_test.csv
20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1
The awk that was used to produce the above is:用于生成以上内容的 awk 是:
% awk 'BEGIN{FS=OFS=","} { a="date -d \"$(date -d \""$1"\") -7 days\" +'%Y%m%d'"; a | getline st; close(a);b="date -d \"$(date -d \""$1"\") -1 days\" +'%Y%m%d'"; b | getline cb; close(b);print $1","$2","$3","st","cb","$4}' test.csv > new_test.csv
But after applying the above in a large file with more than 100K lines it runs for 20 minutes, is there any way to optimize the awk?但是在超过 100K 行的大文件中应用上面的代码运行 20 分钟后,有什么方法可以优化 awk 吗?
One GNU awk
approach:一种GNU awk
方法:
awk '
BEGIN { FS=OFS=","
secs_in_day = 60 * 60 * 24
}
{ dt = mktime( substr($1,1,4) " " substr($1,5,2) " " substr($1,7,2) " 12 0 0" )
dt1 = strftime("%Y%m%d",dt - secs_in_day )
dt7 = strftime("%Y%m%d",dt - (secs_in_day * 7) )
print $1,$2,$3,dt7,dt1,$4
}
' test.csv
This generates:这会产生:
20220601,A,B,20220525,20220531,1
20220530,A,B,20220523,20220529,1
NOTES:笔记:
GNU awk
for the mktime()
and strftime()
functions; mktime()
和strftime()
函数需要GNU awk
; see GNU awk time functions for more details有关详细信息,请参阅GNU awk 时间函数awk
may have similar functions, ymmv awk
的其他口味可能有类似的功能,ymmvYou can try using function calls, it is faster than calling the.你可以试试用function打电话,比打电话快。
awk -F, '
function cmd1(date){
a="date -d \"$(date -d \""date"\") -1days\" +'%Y%m%d'"
a | getline st
return st
close(a)
}
function cmd2(date){
b="date -d \"$(date -d \""date"\") -7days\" +'%Y%m%d'"
b | getline cm
return cm
close(b)
}
{
$5=cmd1($1)
$6=cmd2($1)
print $1","$2","$3","$5","$6","$4
}' OFS=, test > newFileTest
I executed this against a file with 20000 records in seconds, compared to the original awk which took around 5 minutes.我在几秒钟内针对一个包含 20000 条记录的文件执行了此操作,而原来的 awk 花费了大约 5 分钟。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.