简体   繁体   中英

Adding constant value as column at the end of file using awk

I want to add a column with a constant value at the end of each line of a file in bash, whilst selecting columns, doing a mathematical operation, and changing the field separator (from what I think is just tab) to space.

My input file:

10:100968448:T:AA       0.3519  10      100968448       t       aa      1.0024  0.01    0.812
10:101574552:A:ATG      0.4493  10      101574552       a       atg     0.98906 0.0097  0.2585
10:102244152:A:AG       0.2008  10      102244152       a       ag      0.996705        0.0114  0.7701
10:102290698:A:AG       0.1899  10      102290698       a       ag      0.993024        0.0114  0.5431
10:104999458:T:TG       0.3449  10      104999458       t       tg      0.956763        0.0101  1.149e-05

If I throw the constant at the second to last column:

awk -v OFS=" " 'BEGIN { FS = "\t" } ;  {print $1, $5, $6, log($7)/log(10), '105318', $9}' input

It works:

10:100968448:T:AA t aa 0.00104106 105318 0.812
10:101574552:A:ATG a atg -0.00477736 105318 0.2585
10:102244152:A:AG a ag -0.00143336 105318 0.7701
10:102290698:A:AG a ag -0.00304026 105318 0.5431
10:104999458:T:TG t tg -0.0191956 105318 1.149e-05

But when I try putting the constant at the end of the file, as I need it:

awk -v OFS=" " 'BEGIN { FS = "\t" } ;  {print $1, $5, $6, log($7)/log(10), $9, '105318'}' input

It doesn't really work (it's adding the constant to the first field):

 10531868448:T:AA t aa 0.00104106 0.812
 10531874552:A:ATG a atg -0.00477736 0.2585
 10531844152:A:AG a ag -0.00143336 0.7701
 10531890698:A:AG a ag -0.00304026 0.5431
 10531899458:T:TG t tg -0.0191956 1.149e-05

I even tried using the file where it works, shuffling the columns, and the constant is added somewhere random... I have used dos2unix on this file, thinking maybe there's some weird character in it, but the problem remains the same. When I use comma as the output field separator, I see that the multiple commas are generated at the end of the file (when I try to include the constant as the last column).

For clarification, desired output:

10:100968448:T:AA t aa 0.00104106 0.812 105318 
10:101574552:A:ATG a atg -0.00477736 0.2585 105318 
10:102244152:A:AG a ag -0.00143336 0.7701 105318 
10:102290698:A:AG a ag -0.00304026 0.5431 105318 
10:104999458:T:TG t tg -0.0191956 1.149e-05 105318 

Any ideas?

Your input file has dos line endings. Remove the carriage return characters using dos2unix or similar tools.

The output you are seeing is that $9 field in awk has the carriage return character, so when you add field, the cursor is shifted to the beginning of the line before printing the last field.

10:100968448:T:AA t aa 0.00104106 105318 0.812<CR> 105318

CR shifts the cursor position to the beginning of the line when printing, so you see:

 10531868448:T:AA t aa 0.00104106 105318 0.812

Could you please try following.

awk '{print $1,$5,$6,log($7)/log(10),$NF,105318}' Input_file

In case you have control M characters as per Kamil's answer then run following.

awk '{gsub(/\r/,"");print $1,$5,$6,log($7)/log(10),$NF,105318}' Input_file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM