简体   繁体   中英

using awk match 1 column to another in two files and then do date subtraction for the matching records

Using awk match 1 column to another in two files and then do date subtraction(in days) for the matching records.

Lets suppose i have two files

file1:

123,2-jul-2016
124,2-jul-2018

file2:

123,2-jul-2015
124,2-jul-2017

If matched then give me output as

123,366
124,366

Thanks for the help

Could you please try following.

awk '
BEGIN{
  FS=OFS=","
  num=split("jan,feb,mar,apr,may,jun,jul,aug,oct,nov,dec",month,",")
  for(i=1;i<=num;i++){
    daymonth[month[i]]=i}
}
FNR==NR{
  a[$1]=$2
  next
}
($1 in a){
  split($2,array2,"-")
  split(a[$1],array1,"-")
  print $1,(mktime(sprintf("%d %d %d 0 0 0 0",array2[3],daymonth[array2[2]],array2[1]))-\
            mktime(sprintf("%d %d %d 0 0 0 0",array1[3],daymonth[array1[2]],array1[1])))\
                  /86400
}'  Input_file2   Input_file1

With GNU awk for time functions:

$ cat tst.awk
BEGIN {
    FS  = "[,-]"
    OFS = ","
}
{
    mthNr = (index("janfebmaraprmayjunjulaugsepoctnovdec",$3)+2)/3
    date  = sprintf("%04d %02d %02d 00 00 00", $4, mthNr, $2)
    secs  = mktime(date)
}
NR==FNR {
    end[$1] = secs
    next
}
{
    print $1, int((end[$1] - secs) / (24*60*60))
}

$ awk -f tst.awk file1 file2
123,366
124,365

The expected output in your question was wrong as it didn't account for the leap day in feb 2016 or for 2015 it's assuming that, for example, the difference between 3 and 4 is 2 instead of 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM