使用Bash手动编辑文本或Fastq文件

Question

I would like to manually edit a Fastq file using Bash to multiple similar lines. 我想使用Bash手动将Fastq文件编辑为多个相似的行。

In Fastq files a sequence read starts on line 2 and then is found every fourth line (ie lines 2,6,10,14...). 在Fastq文件中，序列读取从第2行开始，然后每四行找到一次（即第2、6、10、14等行）。

I would like to create an edited text file that is identical to a Fastq file except the first 6 characters of the sequencing reads are trimmed off. 我想创建一个与Fastq文件相同的已编辑文本文件，不同之处在于，将修剪掉序列读取的前6个字符。

Unedited Fastq: 未经编辑的Fastq：

@M03017:21:000000000
GAGAGATCTCTCTCTCTCTCT
+
111>>B1FDFFF

Edited Fastq: 编辑的Fastq：

@M03017:21:000000000
TCTCTCTCTCTCTCT
+
111>>B1FDFFF

Answer 1

I guess awk is perfect for this: 我猜awk非常适合：

$ awk 'NR%4==2 {gsub(/^.{6}/,"")} 1' file
@M03017:21:000000000
TCTCTCTCTCTCTCT
+
111>>B1FDFFF

This removes the first 6 characters in all the lines in the 4k+2 position. 这将删除4k + 2位置的所有行中的前6个字符。

Explanation 说明

NR%4==2 {} do things if the number of record (number of line) is on 4k+2 form. NR%4==2 {}如果记录数（行数）为4k + 2格式，则执行操作。
gsub(/^.{6}/,"") replace the 6 first chars with empty string. gsub(/^.{6}/,"")用空字符串替换前6个字符。
1 as evaluated to True, print the line. 1评估为True，打印该行。

Answer 2

GNU sed can do that: GNU sed可以做到：

sed -i~ '2~4s/^.\{6\}//' file

The address 2~4 means "start on line 2, repeat each 4 lines". 地址2~4表示“从第2行开始，每4行重复一次”。

s means replace, ^ matches the line beginning, . s表示替换， ^与行开头匹配. matches any character, \\{6\\} specifies the length (a "quantifier"). 匹配任何字符， \\{6\\}指定长度（“量化符”）。 The replacement string is empty ( // ). 替换字符串为空（ // ）。

-i~ replaces the file in place, leaving a backup with the ~ appended to the filename. -i~替换的地方文件，而与备份~附加到文件名。

使用Bash手动编辑文本或Fastq文件

问题描述

2 个解决方案

解决方案1
1 2015-02-16 15:57:31

Explanation 说明

解决方案2
1 已采纳 2015-02-16 16:22:54

使用Bash手动编辑文本或Fastq文件

问题描述

2 个解决方案

解决方案1 1 2015-02-16 15:57:31

Explanation 说明

解决方案2 1 已采纳 2015-02-16 16:22:54

解决方案1
1 2015-02-16 15:57:31

解决方案2
1 已采纳 2015-02-16 16:22:54