简体   繁体   中英

perl: how to remove particular word or pattern in between two patterns

I want to remove some words within two patterns using perl

The following is my text

..........

QWWK jhjh  kljdfh jklh jskdhf jkh PQXY
lhj ah jh sdlkjh PQXY jha slkdjh 
PQXY jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........

Now i want to remove all PQXY words which only lies between the two patterns ^QWWK and KWWQ$

I know how to replace the whole thing inbetween the two patterns by the following command

perl -0777pe 's/^QWWK(?:(?!QWWK|KWWQ).)*KWWQ$/sometext/gms' filename

Also note that ^QWWK(?:(?!QWWK|KWWQ).)*KWWQ$ this pattern only matches those where there is no QWWK and KWWQ inbetween.

您可以使用范围运算符:

perl -pe 's/PQXY//g if /^QWWK/ .. /KWWQ$/'

Here is the approach you've tried, with the little more needed for it to work

perl -0777 -wpe's{^(QWWK (?:(?!QWWK|KWWQ).)*? KWWQ)$}{ $1 =~ s/PQXY//gr }egmsx' file

The /e modifier makes it evaluate the replacement side as code, and we run a regex there.

In that regex the /r modifier makes it return the changed string (and not change the original, what allows us to run it on $1 which is read-only).

The requirement that the ^QWWK -to- KWWQ$ block of text not contain either of these phrases is satisfied by the code above but a few comments may be helpful.

We don't need the non-greedy .*? since .* (following the negative lookahead) actually stops at KWWQ$ . But this is tricky to ascertain, and .* just has the potential to slurp up all up to the very last KWWQ , including all other possible blocks and any text between them.

Altogether I just find .*? safer and simpler, specially as that is what is needed.

The QWWK must start a line (it's given with ^ in the question) to be a marker for a block. If an extra QWWK is found inside the block then the whole block does not match. But, if that "extra" QWWK inside happens to be on the beginning of a line then

  • what would've been a block doesn't match, since there is QWWK inside

  • a block is in fact matched beginning with that QWWK

I use /x above so to be able to space out the pattern for readability.

Update: To replace PQXY only if QWWK or KWWQ are NOT present between ^QWWK and KWWQ$ give this a try:

perl -pe 'if (/^QWWK/ .. /KWWQ$/) {s/PQXY//g if ! /.+QWWK/ && !/KWWQ.+/}' filename

I'm sure it can be cleaned up / golfed, however I think it will give you what you are asking for.

If I understand your question correctly, this may be clearer with other tools than regexes. The following does collapse any whitespace between words to a single space.

Input qwwk.txt (with one line added)

..........

QWWK jhjh  kljdfh jklh jskdhf jkh PQXY
lhj ah jh sdlkjh PQXY jha slkdjh
PQXY jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........

KWWQ in mid line doesn't trigger: QWWK a PQXY b KWWQ c QWWK d PQXY e KWWQ

Command perl qwwk.pl qwwk.txt

Output

..........

QWWK jhjh kljdfh jklh jskdhf jkh
lhj ah jh sdlkjh jha slkdjh
jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........

KWWQ in mid line doesn't trigger: QWWK a PQXY b KWWQ c QWWK d PQXY e KWWQ

Program qwwk.pl

use strict; use warnings;
while(<>) {             # for each line
    my @out;
    my @words=split;    # get its words

    for my $i (0..$#words) {
        my $w=$words[$i];
        my $active = ($i==0 && $w eq q(QWWK)) .. ($i==$#words && $w eq q(KWWQ));
            # Keep track of where we are.  See notes below.
        push @out, $w unless $active and ($w eq q(PQXY));
            # Save words we want to keep
    } #foreach word

    print join(q( ), @out), qq(\n);     # Print the words we saved
} #foreach line

The key is that the flip-flop ( .. ) operator in the $active= FOO .. BAR assignment keeps its state regardless of what is happening around it. It will be true from a QWWK at the start of a line ( ($i==0 && $w eq q(QWWK)) ) to a KWWQ at the end of the line ( ($i==$#words && $w eq q(KWWQ)) ), regardless of how many lines intervene.

As a one-liner

perl -Mstrict -Mwarnings -ne 'my @out; my @words=split; for my $i (0..$#words) { my $w=$words[$i]; my $active = ($i==0 && $w eq q(QWWK)) .. ($i==$#words && $w eq q(KWWQ)); push @out, $w unless $active and ($w eq q(PQXY)); } print join(q( ), @out), qq(\n);' qwwk.txt

The difference here is that -n provides the while(<>){} loop, so that's not included in the -e script. (Plus, now you know why I used q() and qq() in the standalone program ;) .)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM