简体   繁体   中英

Get string between - Find all occurrences PHP

I found this function which finds data between two strings of text, html or whatever.

How can it be changed so it will find all occurrences? Every data between every occurrence of $start [some-random-data] $end. I want all the [some-random-data] of the document (It will always be different data).

function getStringBetween($string, $start, $end) {
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

One possible approach:

function getContents($str, $startDelimiter, $endDelimiter) {
  $contents = array();
  $startDelimiterLength = strlen($startDelimiter);
  $endDelimiterLength = strlen($endDelimiter);
  $startFrom = $contentStart = $contentEnd = 0;
  while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
    $contentStart += $startDelimiterLength;
    $contentEnd = strpos($str, $endDelimiter, $contentStart);
    if (false === $contentEnd) {
      break;
    }
    $contents[] = substr($str, $contentStart, $contentEnd - $contentStart);
    $startFrom = $contentEnd + $endDelimiterLength;
  }

  return $contents;
}

Usage:

$sample = '<start>One<end>aaa<start>TwoTwo<end>Three<start>Four<end><start>Five<end>';
print_r( getContents($sample, '<start>', '<end>') );
/*
Array
(
    [0] => One
    [1] => TwoTwo
    [2] => Four
    [3] => Five
)
*/ 

Demo .

You can do this using regex:

function getStringsBetween($string, $start, $end)
{
    $pattern = sprintf(
        '/%s(.*?)%s/',
        preg_quote($start),
        preg_quote($end)
    );
    preg_match_all($pattern, $string, $matches);

    return $matches[1];
}

I love to use explode to get string between two string. this function also works for multiple occurrences.

function GetIn($str,$start,$end){
    $p1 = explode($start,$str);
    for($i=1;$i<count($p1);$i++){
        $p2 = explode($end,$p1[$i]);
        $p[] = $p2[0];
    }
    return $p;
}

I needed to find all these occurences between specific first and last tag and change them somehow and get back changed string.

So I added this small code to raina77ow approach after the function.

        $sample = '<start>One<end> aaa <start>TwoTwo<end> Three <start>Four<end> aaaaa <start>Five<end>';
        $sample_temp = getContents($sample, '<start>', '<end>');
        $i = 1;
        foreach($sample_temp as $value) {
            $value2 = $value.'-'.$i; //there you can change the variable
            $sample=str_replace('<start>'.$value.'<end>',$value2,$sample);
            $i = ++$i;
        }
        echo $sample;

Now output sample has deleted tags and all strings between them has added number like this:

One-1 aaa TwoTwo-2 Three Four-3 aaaaa Five-4

But you can do whatever else with them. Maybe could be helpful for someone.

I also needed the text outside the pattern. So I changed the answer from raina77ow above a little:

function get_delimited_strings($str, $startDelimiter, $endDelimiter) {
    $contents = array();
    $startDelimiterLength = strlen($startDelimiter);
    $endDelimiterLength = strlen($endDelimiter);
    $startFrom = $contentStart = $contentEnd = $outStart = $outEnd = 0;
    while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
        $contentStart += $startDelimiterLength;
        $contentEnd = strpos($str, $endDelimiter, $contentStart);
        $outEnd = $contentStart - 1;
        if (false === $contentEnd) {
            break;
        }
        $contents['in'][] = substr($str, $contentStart, $contentEnd - $contentStart);
        $contents['out'][] = substr($str, $outStart, $outEnd - $outStart);
        $startFrom = $contentEnd + $endDelimiterLength;
        $outStart = $startFrom;
    }
    $contents['out'][] = substr($str, $outStart, $contentEnd - $outStart);
    return $contents;
}

Usage:

    $str = "Bore layer thickness [2 mm] instead of [1,25 mm] with [0,1 mm] deviation.";
    $cas = get_delimited_strings($str, "[", "]");

gives:

array(2) { 
    ["in"]=> array(3) { 
        [0]=> string(4) "2 mm" 
        [1]=> string(7) "1,25 mm" 
        [2]=> string(6) "0,1 mm" 
    } 
    ["out"]=> array(4) { 
        [0]=> string(21) "Bore layer thickness " 
        [1]=> string(12) " instead of " 
        [2]=> string(6) " with " 
        [3]=> string(10) " deviation" 
    } 
}

There was some great sollutions here, however not perfekt for extracting parts of code from say HTML which was my problem right now, as I need to get script blocks out of the HTML before compressing the HTML. So building on @raina77ow original sollution, expanded by @Cas Tuyn I get this one:

$test_strings = [
    '0<p>a</p>1<p>b</p>2<p>c</p>3',
    '0<p>a</p>1<p>b</p>2<p>c</p>',
    '<p>a</p>1<p>b</p>2<p>c</p>3',
    '<p>a</p>1<p>b</p>2<p>c</p>',
    '<p></p>1<p>b'
];

/**
* Seperate a block of code by sub blocks. Example, removing all <script>...<script> tags from HTML kode
* 
* @param string $str, text block
* @param string $startDelimiter, string to match for start of block to be extracted
* @param string $endDelimiter, string to match for ending the block to be extracted
* @return array [all full blocks, whats left of string]
*/
function getDelimitedStrings($str, $startDelimiter, $endDelimiter) {
    $contents = array();
    $startDelimiterLength = strlen($startDelimiter);
    $endDelimiterLength = strlen($endDelimiter);
    $startFrom = $contentStart = $contentEnd = $outStart = $outEnd = 0;
    while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
        $contentStart += $startDelimiterLength;
        $contentEnd = strpos($str, $endDelimiter, $contentStart);
        $outEnd = $contentStart - 1;
        if (false === $contentEnd) {
            break;
        }
        $contents['in'][] = substr($str, ($contentStart-$startDelimiterLength), ($contentEnd + ($startDelimiterLength*2) +1) - $contentStart);
        if( $outStart ){
            $contents['out'][] = substr($str, ($outStart+$startDelimiterLength+1), $outEnd - $outStart - ($startDelimiterLength*2));
        } else if( ($outEnd - $outStart - ($startDelimiterLength-1)) > 0 ){
            $contents['out'][] = substr($str, $outStart, $outEnd - $outStart - ($startDelimiterLength-1));
        }
        $startFrom = $contentEnd + $endDelimiterLength;
        $startFrom = $contentEnd;
        $outStart = $startFrom;
    }
    $total_length = strlen($str);
    $current_position = $outStart + $startDelimiterLength + 1;
    if( $current_position < $total_length )
        $contents['out'][] = substr($str, $current_position);

    return $contents;
}

foreach($test_strings AS $string){
    var_dump( getDelimitedStrings($string, '<p>', '</p>') );
}

This will extract all

wlements with the possible innerHTML aswell, giving this result:

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=4)
    0 => string '0' (length=1)
    1 => string '1' (length=1)
    2 => string '2' (length=1)
    3 => string '3' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=3)
    0 => string '0' (length=1)
    1 => string '1' (length=1)
    2 => string '2' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=3)
    0 => string '1' (length=1)
    1 => string '2' (length=1)
    2 => string '3' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=2)
    0 => string '1' (length=1)
    1 => string '2' (length=1)

array (size=2)
'in' => array (size=1)
    0 => string '<p></p>' (length=7)
'out' => array (size=1)
    0 => string '1<p>b' (length=5)

You can see a demo here: 3v4l.org/TQLmn

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM