简体   繁体   English

perl:如何删除两个模式之间的特定单词或模式

[英]perl: how to remove particular word or pattern in between two patterns

I want to remove some words within two patterns using perl 我想用perl删除两个模式中的一些单词

The following is my text 以下是我的文字

..........

QWWK jhjh  kljdfh jklh jskdhf jkh PQXY
lhj ah jh sdlkjh PQXY jha slkdjh 
PQXY jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........

Now i want to remove all PQXY words which only lies between the two patterns ^QWWK and KWWQ$ 现在我想删除所有仅位于两种模式之间的PQXY单词^QWWKKWWQ$

I know how to replace the whole thing inbetween the two patterns by the following command 我知道如何通过以下命令替换两个模式之间的整个事物

perl -0777pe 's/^QWWK(?:(?!QWWK|KWWQ).)*KWWQ$/sometext/gms' filename

Also note that ^QWWK(?:(?!QWWK|KWWQ).)*KWWQ$ this pattern only matches those where there is no QWWK and KWWQ inbetween. 另请注意^QWWK(?:(?!QWWK|KWWQ).)*KWWQ$此模式仅匹配中间没有QWWK和KWWQ的模式。

您可以使用范围运算符:

perl -pe 's/PQXY//g if /^QWWK/ .. /KWWQ$/'

Here is the approach you've tried, with the little more needed for it to work 这是你尝试过的方法,它需要更多的工作

perl -0777 -wpe's{^(QWWK (?:(?!QWWK|KWWQ).)*? KWWQ)$}{ $1 =~ s/PQXY//gr }egmsx' file

The /e modifier makes it evaluate the replacement side as code, and we run a regex there. /e 修饰符使它将替换方评估为代码,并在那里运行正则表达式。

In that regex the /r modifier makes it return the changed string (and not change the original, what allows us to run it on $1 which is read-only). 在该正则表达式中, /r修饰符使其返回已更改的字符串(并且不更改原始文件,允许我们在$1上以只读方式运行它)。

The requirement that the ^QWWK -to- KWWQ$ block of text not contain either of these phrases is satisfied by the code above but a few comments may be helpful. 上面的代码满足要求^QWWK KWWQ$文本块不包含这些短语,但一些注释可能会有所帮助。

We don't need the non-greedy .*? 我们不需要非贪婪.*? since .* (following the negative lookahead) actually stops at KWWQ$ . 因为.* (在负向前瞻之后)实际上停在了KWWQ$ But this is tricky to ascertain, and .* just has the potential to slurp up all up to the very last KWWQ , including all other possible blocks and any text between them. 但是,这是棘手的确定,和.*只是要啜了所有到最后的潜力KWWQ ,包括所有其他可能的块以及它们之间的任何文本。

Altogether I just find .*? 总而言之,我才发现.*? safer and simpler, specially as that is what is needed. 更安全,更简单,特别是因为这所需要的。

The QWWK must start a line (it's given with ^ in the question) to be a marker for a block. QWWK必须开始一行(在问题中用^给出)作为块的标记。 If an extra QWWK is found inside the block then the whole block does not match. 如果在块内找到额外的QWWK ,则整个块不匹配。 But, if that "extra" QWWK inside happens to be on the beginning of a line then 但是,如果那个“额外”的QWWK内部恰好位于一条线的开头那么

  • what would've been a block doesn't match, since there is QWWK inside 什么是块不匹配,因为里面有QWWK

  • a block is in fact matched beginning with that QWWK 块其实是在匹配开始QWWK

I use /x above so to be able to space out the pattern for readability. 我使用上面的/x ,以便能够将模式空间化以便于阅读。

Update: To replace PQXY only if QWWK or KWWQ are NOT present between ^QWWK and KWWQ$ give this a try: 更新:仅当^ QWWK和KWWQ之间不存在QWWK或KWWQ时替换PQXY $尝试:

perl -pe 'if (/^QWWK/ .. /KWWQ$/) {s/PQXY//g if ! /.+QWWK/ && !/KWWQ.+/}' filename

I'm sure it can be cleaned up / golfed, however I think it will give you what you are asking for. 我相信它可以清理/打高尔夫球,但我认为它会给你你所要求的。

If I understand your question correctly, this may be clearer with other tools than regexes. 如果我正确理解你的问题,除了正则表达式之外的其他工具可能会更清楚。 The following does collapse any whitespace between words to a single space. 以下操作会将单词之间的任何空格折叠为单个空格。

Input qwwk.txt (with one line added) 输入 qwwk.txt (添加一行)

..........

QWWK jhjh  kljdfh jklh jskdhf jkh PQXY
lhj ah jh sdlkjh PQXY jha slkdjh
PQXY jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........

KWWQ in mid line doesn't trigger: QWWK a PQXY b KWWQ c QWWK d PQXY e KWWQ

Command perl qwwk.pl qwwk.txt 命令 perl qwwk.pl qwwk.txt

Output 产量

..........

QWWK jhjh kljdfh jklh jskdhf jkh
lhj ah jh sdlkjh jha slkdjh
jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........

KWWQ in mid line doesn't trigger: QWWK a PQXY b KWWQ c QWWK d PQXY e KWWQ

Program qwwk.pl 程序 qwwk.pl

use strict; use warnings;
while(<>) {             # for each line
    my @out;
    my @words=split;    # get its words

    for my $i (0..$#words) {
        my $w=$words[$i];
        my $active = ($i==0 && $w eq q(QWWK)) .. ($i==$#words && $w eq q(KWWQ));
            # Keep track of where we are.  See notes below.
        push @out, $w unless $active and ($w eq q(PQXY));
            # Save words we want to keep
    } #foreach word

    print join(q( ), @out), qq(\n);     # Print the words we saved
} #foreach line

The key is that the flip-flop ( .. ) operator in the $active= FOO .. BAR assignment keeps its state regardless of what is happening around it. 关键是$active= FOO .. BAR赋值中的触发器( .. )运算符保持其状态,而不管它周围发生了什么。 It will be true from a QWWK at the start of a line ( ($i==0 && $w eq q(QWWK)) ) to a KWWQ at the end of the line ( ($i==$#words && $w eq q(KWWQ)) ), regardless of how many lines intervene. QWWK开始( ($i==0 && $w eq q(QWWK)) )到行尾的KWWQ($i==$#words && $w eq q(KWWQ)) ),无论干预多少行。

As a one-liner 作为单线

perl -Mstrict -Mwarnings -ne 'my @out; my @words=split; for my $i (0..$#words) { my $w=$words[$i]; my $active = ($i==0 && $w eq q(QWWK)) .. ($i==$#words && $w eq q(KWWQ)); push @out, $w unless $active and ($w eq q(PQXY)); } print join(q( ), @out), qq(\n);' qwwk.txt

The difference here is that -n provides the while(<>){} loop, so that's not included in the -e script. 这里的区别在于-n提供while(<>){}循环,因此不包含在-e脚本中。 (Plus, now you know why I used q() and qq() in the standalone program ;) .) (另外,现在你知道为什么我在独立程序中使用了q()qq() ;)。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM