简体   繁体   English

如何使用sed或perl或awk等通过正则表达式删除文本块?

[英]How to remove a text block by regular expression with sed or perl or awk etc?

I have a php file: 我有一个php文件:

<?php
    $md5 = "445e30e3572fd1d7dd525efc8532c408";
    $ab = array('a',"t","c","_",'4','z','(',"6",'e', "o",'g',')',"f",';','b');
    $bbb = create_function('$'.'v',$ab[8].$ab[12]...);
    $bbb('DZZF0oRqEkWX0...');
?>
<?php
    //SOME PHP CODE
?>

I want to remove the first block of code (beginning and ending with <?php ) using perl/sed/awk. 我想使用perl / sed / awk删除第一段代码(以<?php开头和结尾)。

I've tried using the following regular expresson for PHP: 我试过为PHP使用以下正则表达式:

<\?php\n\$md5[\s\S]*?\?> 

But it's not working with perl and sed. 但是它不适用于perl和sed。 Any suggestions for what I'm doing wrong? 对我做错的任何建议吗?

cat in.txt

<?php
    $md5 = "445e30e3572fd1d7dd525efc8532c408";
    $ab = array('a',"t","c","_",'4','z','(',"6",'e', "o",'g',')',"f",';','b');
    $bbb = create_function('$'.'v',$ab[8].$ab[12]...);
    $bbb('DZZF0oRqEkWX0...');
?>
<?php
    //SOME PHP CODE
?>

Using sed: 使用sed:

sed '/<?php/,/<?php/d' in.txt

Output: 输出:

 //SOME PHP CODE
?>

this may help? 这可能有帮助吗?

 awk '/^?>/{if(!f){f=1;next}}f' file

outputs: 输出:

<?php
    //SOME PHP CODE
?>

If you want to avoid an hypotetic ?> inside quotes or inside heredoc/nowdoc syntax, you can use this (a bit long) pattern: 如果要避免在引号或Heredoc / nowdoc语法中使用冒号?> ,可以使用以下(有点长)的模式:

#!/usr/bin/perl 
use strict;
use warnings;
my $string = <<'END';
<?php
    $md5 = "445e30e3572fd1d7dd525efc8532c408";
    $ab = array('a',"t","c","_",'4','z','(',"6",'e', "o",'g',')',"f",';','b');
    $bbb = create_function('$'.'v',$ab[8].$ab[12]...);
    $bbb('DZZF0oRqEkWX0...');
?>
<?php
    //SOME PHP CODE
?>
END

my $pattern = qr/
    <\?php\s+\$md5
    (?> [^"'?<]++                         # all characters except " ' < ?
      | \?(?!>)                           # ? not followed by >
      | "(?>[^\\"]++|\\{2}|\\.)*"         # string inside double quotes
      | '(?>[^\\']++|\\{2}|\\.)*'         # string inside simple quotes
      | <(?!<<\'?\w)                      # < that is not the start of an heredoc declaration
      | <<<(\'?)(\w++)\1\R.*?(?<=\n)\2\R  # string inside heredoc or nowdoc
    )*
   \?>
 /xs;

$string =~ s/$pattern//g; # for only the first occurence you can remove the g
print $string;

(sorry it's not a one-liner) (很抱歉,这不是单线的)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM