I would like to manually edit a Fastq file using Bash to multiple similar lines.
In Fastq files a sequence read starts on line 2 and then is found every fourth line (ie lines 2,6,10,14...).
I would like to create an edited text file that is identical to a Fastq file except the first 6 characters of the sequencing reads are trimmed off.
Unedited Fastq:
@M03017:21:000000000
GAGAGATCTCTCTCTCTCTCT
+
111>>B1FDFFF
Edited Fastq:
@M03017:21:000000000
TCTCTCTCTCTCTCT
+
111>>B1FDFFF
I guess awk
is perfect for this:
$ awk 'NR%4==2 {gsub(/^.{6}/,"")} 1' file
@M03017:21:000000000
TCTCTCTCTCTCTCT
+
111>>B1FDFFF
This removes the first 6 characters in all the lines in the 4k+2 position.
NR%4==2 {}
do things if the number of record (number of line) is on 4k+2 form. gsub(/^.{6}/,"")
replace the 6 first chars with empty string. 1
as evaluated to True, print the line. GNU sed can do that:
sed -i~ '2~4s/^.\{6\}//' file
The address 2~4
means "start on line 2, repeat each 4 lines".
s
means replace, ^
matches the line beginning, .
matches any character, \\{6\\}
specifies the length (a "quantifier"). The replacement string is empty ( //
).
-i~
replaces the file in place, leaving a backup with the ~
appended to the filename.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.