I'm trying to substitute a multiline block using perl command line. the text is the following:
@LNCaP.2622 GAPC:1:1:4519:1350 length=76
TTTCCATTGCAGGTTTTAAAGTGGAGATTCTGAAGGGGAAAATAGGCACTGTCAGAACAAAGCTACCTGGAAACAG
+LNCaP.2622 GAPC:1:1:4519:1350 length=76
DD@:BBBBDDD@D:B::=:6:(6//;589444004':839>>2;;:':>>:7B:><B<B#################
@LNCaP.2623 GAPC:1:1:4767:1343 length=76
+LNCaP.2623 GAPC:1:1:4767:1343 length=76
@LNCaP.2624 GAPC:1:1:4794:1349 length=76
and I tried to run the following regex:
perl -pe "s/^@.*\n\s*\n+//mg" test.txt
hoping to get the following output:
@LNCaP.2622 GAPC:1:1:4519:1350 length=76
TTTCCATTGCAGGTTTTAAAGTGGAGATTCTGAAGGGGAAAATAGGCACTGTCAGAACAAAGCTACCTGGAAACAG
+LNCaP.2622 GAPC:1:1:4519:1350 length=76
DD@:BBBBDDD@D:B::=:6:(6//;589444004':839>>2;;:':>>:7B:><B<B#################
@LNCaP.2624 GAPC:1:1:4794:1349 length=76
the regex ^@.*\\n\\s*\\n\\+.*\\n\\s*\\n
recognize 4 lines I want to delete on regex101.com using the text above, however, when I run the command from my shell, the output is unchanged :(
I can't use the line number since this is an extract from a much much bigger file, which means that this has to be applied to all the 4 row instances that match that pattern.
any idea what am I doing wrong?
thanks
perl -pe
does line by line processing. So using a regex that spans lines is never going to match by default.
You can change the input record separator $/
though, to slurp the entire file and apply the regex to it:
perl -pe "BEGIN { undef $/ } s/^@.*\n\s*\n+//mg" test.txt
The regex you suggested above doesn't provide the output you want though. To do that, you'd need the following expression:
perl -pe "BEGIN {undef $/} s/^@.*\n\s*\n(?:(?!\@).*\n)*//mg" text.txt
Outputs:
@LNCaP.2622 GAPC:1:1:4519:1350 length=76
TTTCCATTGCAGGTTTTAAAGTGGAGATTCTGAAGGGGAAAATAGGCACTGTCAGAACAAAGCTACCTGGAAACAG
+LNCaP.2622 GAPC:1:1:4519:1350 length=76
DD@:BBBBDDD@D:B::=:6:(6//;589444004':839>>2;;:':>>:7B:><B<B#################
@LNCaP.2624 GAPC:1:1:4794:1349 length=76
Miller is right in his answer. You have to read the whole content of the file to a variable and apply a regular expression to it. Try following code where I read the content in slurp mode and use a negative character class [^\\n]*
to match each line and \\n{2,}
to match blank lines:
#!/usr/bin/env perl
use strict;
use warnings;
my $text = do { undef $/; <DATA> };
$text =~ s/^@(?:[^\n]*\n{2,}){2}//mg;
print $text;
__DATA__
@LNCaP.2622 GAPC:1:1:4519:1350 length=76
TTTCCATTGCAGGTTTTAAAGTGGAGATTCTGAAGGGGAAAATAGGCACTGTCAGAACAAAGCTACCTGGAAACAG
+LNCaP.2622 GAPC:1:1:4519:1350 length=76
DD@:BBBBDDD@D:B::=:6:(6//;589444004':839>>2;;:':>>:7B:><B<B#################
@LNCaP.2623 GAPC:1:1:4767:1343 length=76
+LNCaP.2623 GAPC:1:1:4767:1343 length=76
@LNCaP.2624 GAPC:1:1:4794:1349 length=76
Run it like:
perl script.pl
That yields:
@LNCaP.2622 GAPC:1:1:4519:1350 length=76
TTTCCATTGCAGGTTTTAAAGTGGAGATTCTGAAGGGGAAAATAGGCACTGTCAGAACAAAGCTACCTGGAAACAG
+LNCaP.2622 GAPC:1:1:4519:1350 length=76
DD@:BBBBDDD@D:B::=:6:(6//;589444004':839>>2;;:':>>:7B:><B<B#################
@LNCaP.2624 GAPC:1:1:4794:1349 length=76
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.