简体   繁体   English

sed查找并替换fastq正则表达式

[英]sed find and replace fastq regex

I have a file such as 我有一个文件,例如

head testSed.fastq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:NGTCACTN+TATCCTCTCTTGAAGA
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:NATCAGCN+TAGATCGCCAAGTTAA
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:NCAGCAGN+TATCTTCTATAAATAT
NCAGCAGN

And I am attempting to replace the string after the final colon with 0 (in this example on lines 1,5,9 - but globally) using a regular expression. 我正在尝试使用正则表达式将最后一个冒号之后的字符串替换为0 (在本示例中为第1,5,9行-但全局而言)。

I have checked my regex using egrep egrep '[ATGCN]{8}\\+[ATGCN]{16}$' testSed.fastq which returns all the lines I would expect. 我使用egrep egrep '[ATGCN]{8}\\+[ATGCN]{16}$' testSed.fastq检查了我的正则表达式,它返回了我期望的所有行。

However when I try to use sed -i 's/[ATGCN]{8}\\+[ATGCN]{16}$/0/g' testSed.fastq the original file is unchanged and no replacement occurs. 但是,当我尝试使用sed -i 's/[ATGCN]{8}\\+[ATGCN]{16}$/0/g' testSed.fastq ,原始文件未更改,不会发生替换。

How can I fix this? 我怎样才能解决这个问题? Is my regex not specific enough? 我的正则表达式不够具体吗?

Do you need a regex for this? 您需要正则表达式吗?

awk -F: -v OFS=: '/^@/ {$NF = "0"} 1' testfile

That won't save in-place. 那不会就地保存。 If you have GNU awk you can 如果您有GNU awk,则可以

gawk -F: -v OFS=: -i inplace '...' file

ref: https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html 参考: https : //www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html

Your regex is structured as an ERE rather than a BRE, which is sed's default interpretation. 您的正则表达式的结构为ERE而非BRE,这是sed的默认解释。 Not all sed implementations support ERE, but you can check man sed in your environment to determine whether it's possible for you. 并非所有的sed实现都支持ERE,但是您可以检查环境中的man sed ,以确定是否可行。 Look for -r or -E options. 查找-r-E选项。 You can alternately use bounds by preceding the curly braces with backslashes. 您可以在花括号前面加上反斜杠来替代使用边界。

That said, rather than matching the precise text in the last field, why not just look for the string that starts with a colon, and is followed by no-more-colons? 就是说,为什么不只查找最后一个字段中的精确文本,而不是查找以冒号开头,后跟无分号的字符串呢? The following RE is both BRE and ERE compatible. 以下RE既与BRE又与ERE兼容。

$ sed '/^@/s/:[^:]*$/:0/' testq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:0
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:0
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:0
NCAGCAGN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM