简体   繁体   中英

sed find and replace fastq regex

I have a file such as

head testSed.fastq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:NGTCACTN+TATCCTCTCTTGAAGA
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:NATCAGCN+TAGATCGCCAAGTTAA
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:NCAGCAGN+TATCTTCTATAAATAT
NCAGCAGN

And I am attempting to replace the string after the final colon with 0 (in this example on lines 1,5,9 - but globally) using a regular expression.

I have checked my regex using egrep egrep '[ATGCN]{8}\\+[ATGCN]{16}$' testSed.fastq which returns all the lines I would expect.

However when I try to use sed -i 's/[ATGCN]{8}\\+[ATGCN]{16}$/0/g' testSed.fastq the original file is unchanged and no replacement occurs.

How can I fix this? Is my regex not specific enough?

Do you need a regex for this?

awk -F: -v OFS=: '/^@/ {$NF = "0"} 1' testfile

That won't save in-place. If you have GNU awk you can

gawk -F: -v OFS=: -i inplace '...' file

ref: https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html

Your regex is structured as an ERE rather than a BRE, which is sed's default interpretation. Not all sed implementations support ERE, but you can check man sed in your environment to determine whether it's possible for you. Look for -r or -E options. You can alternately use bounds by preceding the curly braces with backslashes.

That said, rather than matching the precise text in the last field, why not just look for the string that starts with a colon, and is followed by no-more-colons? The following RE is both BRE and ERE compatible.

$ sed '/^@/s/:[^:]*$/:0/' testq
@M01551:51:000000000-BCB7H:1:1101:15800:1330 1:N:0:0
NGTCACTN
+
#>AAAAF#
@M01551:51:000000000-BCB7H:1:1101:15605:1331 1:N:0:0
NATCAGCN
+
#>>AA?C#
@M01551:51:000000000-BCB7H:1:1101:15557:1332 1:N:0:0
NCAGCAGN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM