I have a file with many columns and rows and I want to remove the rows that are more than one character in the fourth and fifth columns.
Input:
--- 22:16050115:G:A 16050115 GGG A
--- 22:16050213:C:T 16050213 C T
--- 22:16050319:C:T 16050319 C T
--- 22:16050527:C:A 16050527 C AAA
--- 22:16050568:C:A 16050568 CC A
--- 22:16050607:G:A 16050607 G A
--- 22:16050627:G:T 16050627 G TGG
--- 22:16050646:G:T 16050646 G T
--- 22:16050655:G:A 16050655 GTAA A
...
Desired output:
--- 22:16050213:C:T 16050213 C T
--- 22:16050319:C:T 16050319 C T
--- 22:16050607:G:A 16050607 G A
--- 22:16050646:G:T 16050646 G T
...
Thank you very much.
awk 'length($4)==1 && length($5)==1' inputfile
--- 22:16050213:C:T 16050213 C T
--- 22:16050319:C:T 16050319 C T
--- 22:16050607:G:A 16050607 G A
--- 22:16050646:G:T 16050646 G T
This will check the length of $4
and $5
using length()
function of awk
. This is using comparison operator ==
. You can modify it to <
, >
, <=
etc. So the above command will print the lines which have only one character in their 4th and 5th column.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.