简体   繁体   中英

Using Bash to Manually Edit a Text or Fastq file

I would like to manually edit a Fastq file using Bash to multiple similar lines.

In Fastq files a sequence read starts on line 2 and then is found every fourth line (ie lines 2,6,10,14...).

I would like to create an edited text file that is identical to a Fastq file except the first 6 characters of the sequencing reads are trimmed off.

Unedited Fastq:

@M03017:21:000000000
GAGAGATCTCTCTCTCTCTCT
+
111>>B1FDFFF

Edited Fastq:

@M03017:21:000000000
TCTCTCTCTCTCTCT
+
111>>B1FDFFF

I guess awk is perfect for this:

$ awk 'NR%4==2 {gsub(/^.{6}/,"")} 1' file
@M03017:21:000000000
TCTCTCTCTCTCTCT
+
111>>B1FDFFF

This removes the first 6 characters in all the lines in the 4k+2 position.

Explanation

  • NR%4==2 {} do things if the number of record (number of line) is on 4k+2 form.
  • gsub(/^.{6}/,"") replace the 6 first chars with empty string.
  • 1 as evaluated to True, print the line.

GNU sed can do that:

sed -i~ '2~4s/^.\{6\}//' file

The address 2~4 means "start on line 2, repeat each 4 lines".

s means replace, ^ matches the line beginning, . matches any character, \\{6\\} specifies the length (a "quantifier"). The replacement string is empty ( // ).

-i~ replaces the file in place, leaving a backup with the ~ appended to the filename.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM