简体   繁体   中英

Error in writing output file through AWK scripting

I have a AWK script to write specific values matching with specific pattern to a .csv file. The code is as follows:

BEGIN{print "Query Start,Query End, Target Start, Target End,Score, E,P,GC"}
/^\>g/ { Query=$0 }
 /Query =/{
    split($0,a," ")
    query_start=a[3]
    query_end=a[5]
    query_end=gsub(/,/,"",query_end)
    target_start=a[8]
    target_end=a[10]
    }
    /Score =/{
    split($0,a," ")
    score=a[3]
    score=gsub(/,/,"",score)
    e=a[6]
    e=gsub(/,/,"",e)
    p=a[9]
    p=gsub(/,/,"",p)
    gc=a[12]

    printf("%s,%s,%s,%s,%s,%s,%s,%s\n",query_start, query_end,target_start,target_end,score,e,p,gc)
    }

The input file is as follows:

>gi|ABCDEF|

 Plus strand results:

 Query = 100 - 231, Target = 100 - 172
 Score = 20.92, E = 0.01984, P = 4.309e-08, GC =  51

But I received the output in a .csv file as provided below:

100 0   100 172 0   0   0   51

The program failed to copy the values of: Query end Score EP (Note: all the failed values are present before comma (,))

Any help to obtain the right output will be great.

Best regards,

Amit

As @Jidder mentioned, you don't need to call split() and as @jaypal mentioned you're using gsub() incorrectly, but also you don't need to call gsub() at all if you just include , in your FS.

Try this:

BEGIN {
    FS = "[[:space:],]+"
    OFS = ","
    print "Query Start","Query End","Target Start","Target End","Score","E","P","GC"
}
/^\>g/ { Query=$0 }
/Query =/ {
    query_start=$4
    query_end=$6
    target_start=$9
    target_end=$11
}
/Score =/ {
    score=$4
    e=$7
    p=$10
    gc=$13

    print query_start,query_end,target_start,target_end,score,e,p,gc
}

That work? Note the field numbers are bumped out by 1 because when you don't use the default FS awk no longer skips leading white space so there's an empty field before the white space in your input.

Obviously, you are not using your Query variable so the line that populates it is redundant.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM