[英]Using Bash to Manually Edit a Text or Fastq file
I would like to manually edit a Fastq file using Bash to multiple similar lines. 我想使用Bash手动将Fastq文件编辑为多个相似的行。
In Fastq files a sequence read starts on line 2 and then is found every fourth line (ie lines 2,6,10,14...). 在Fastq文件中,序列读取从第2行开始,然后每四行找到一次(即第2、6、10、14等行)。
I would like to create an edited text file that is identical to a Fastq file except the first 6 characters of the sequencing reads are trimmed off. 我想创建一个与Fastq文件相同的已编辑文本文件,不同之处在于,将修剪掉序列读取的前6个字符。
Unedited Fastq: 未经编辑的Fastq:
@M03017:21:000000000
GAGAGATCTCTCTCTCTCTCT
+
111>>B1FDFFF
Edited Fastq: 编辑的Fastq:
@M03017:21:000000000
TCTCTCTCTCTCTCT
+
111>>B1FDFFF
I guess awk
is perfect for this: 我猜
awk
非常适合:
$ awk 'NR%4==2 {gsub(/^.{6}/,"")} 1' file
@M03017:21:000000000
TCTCTCTCTCTCTCT
+
111>>B1FDFFF
This removes the first 6 characters in all the lines in the 4k+2 position. 这将删除4k + 2位置的所有行中的前6个字符。
NR%4==2 {}
do things if the number of record (number of line) is on 4k+2 form. NR%4==2 {}
如果记录数(行数)为4k + 2格式,则执行操作。 gsub(/^.{6}/,"")
replace the 6 first chars with empty string. gsub(/^.{6}/,"")
用空字符串替换前6个字符。 1
as evaluated to True, print the line. 1
评估为True,打印该行。 GNU sed can do that: GNU sed可以做到:
sed -i~ '2~4s/^.\{6\}//' file
The address 2~4
means "start on line 2, repeat each 4 lines". 地址
2~4
表示“从第2行开始,每4行重复一次”。
s
means replace, ^
matches the line beginning, .
s
表示替换, ^
与行开头匹配.
matches any character, \\{6\\}
specifies the length (a "quantifier"). 匹配任何字符,
\\{6\\}
指定长度(“量化符”)。 The replacement string is empty ( //
). 替换字符串为空(
//
)。
-i~
replaces the file in place, leaving a backup with the ~
appended to the filename. -i~
替换的地方文件,而与备份~
附加到文件名。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.