简体   繁体   中英

Regex multiline match excluding lines containing a string

In the following regex:

EXCLUDE this entire line
include this line
and this as single match
and EXCLUDE this line

I want to return a single match consisting for two lines:

include this line
and this as single match

I want to use EXCLUDE as string identifying that the entire line should not be included.

edit: if I can get just the first match up to the line with "EXCLUDE" (or end of document whichever happens first), that would work too

You can split the string on matches of the regular expression

^.*\bEXCLUDE\b.*\R

with global and multiline flags set.

In Ruby, for example, if the variable str held the string

Firstly include this line
EXCLUDE this entire line
include this line
and this as single match
and EXCLUDE this line
Lastly include this line

then the method String#split could be used to produce an array containing three strings.

str.split(/^.*\bEXCLUDE\b.*\R/)
  #=> ["Firstly include this line",
  #    "include this line\nand this as single match",
  #    "Lastly include this line"]

Many languages have a method or function that is comparable to Ruby's split .

Demo

The regular expression can be broken down as follows.

^        # match the beginning of a line
.*       # match zero or more characters other than line
         # terminators, as many as possible
\b       # match word boundary
EXCLUDE  # match literal
\b       # match word boundary
.*       # match zero or more characters other than line
         # terminators, as many as possible
\R       # match line terminator
 

With pcre you can use \K to fotget what is matched so far, and first match the line containing exclude:

^.*\bEXCLUDE\b.*\K(?:\R(?!.*\bEXCLUDE\b).*)+

Regex demo

If you want to match all lines that do not contain exclude, with consecutive lines:

(?:(?:^|\R)(?!.*\bEXCLUDE\b).*)+

Regex demo

Or using a skip fail approach:

^.*\bEXCLUDE\b.*\R(*SKIP)(*F)|.+(?:\R(?!.*\bEXCLUDE\b).*)*

Regex demo

You could also match the lines with the EXCLUDE and use it to split your text into blocks of what you are looking for:

<?php

$input = 'First include this line
EXCLUDE this entire line
include this line
and this as single match
and EXCLUDE this line
Lastly include this line';

// ^ matches the beginning of a line.
// .* matches anything (except new lines) zero or multiple times.
// \b matches a word boundary (to avoid matching NOEXCLUDE).
// $ matches the end of a line.
$pattern = '/^.*\bEXCLUDE\b.*$/m';

// Split the text with all lines containing the EXCLUDE word.
$desired_blocks = preg_split($pattern, $input);

// Get rid of the new lines around the matched blocks.
array_walk(
    $desired_blocks,
    function (&$block) {
        // \R matches any Unicode newline sequence.
        // ^ matches the beginning of the string.
        // $ matches the end of the string.
        // | = or
        $block = preg_replace('/^\R+|\R+$/', '', $block);
    }
);

var_export($desired_blocks);

Demo here: https://onlinephp.io/c/4216a

Output:

array (
  0 => 'First include this line',
  1 => 'include this line
and this as single match',
  2 => 'Lastly include this line',
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM