简体   繁体   中英

php - Regex - space and non-capturing group

I have this kind of string :

Blabla1 Blaabla2<br />  Blaabla3 Blaabla4

I'm trying to explode each word where there is a " " or "<br />" with preg_split .

What I exepect :

Blabla1
Blabla2 <br />
Blaabla3
Blaabla4

I tried with this regex (?:(<br\\s))|\\s but don't manage to exlude "/>"

http://regexr.com/3aqs0

Thanks !

One way you could do this:

$str = 'Blabla1 Blaabla2<br />  Blaabla3 Blaabla4';
$results = preg_split('~(?:<br[^>]*>\s*\K|\s+)~', $str);
print_r($results);

Output

Array
(
    [0] => Blabla1
    [1] => Blaabla2<br />  
    [2] => Blaabla3
    [3] => Blaabla4
)

If there is not more HTML, it's okay to use RegEx. Otherwise there are many better ways .

Use <br(\\s\\/)?>\\K|\\s :

$matches = preg_split('/<br(\s\/)?>\K|\s/',$string);

This will also work for <br> (which is correct HTML too)

Consider the flag PREG_SPLIT_NO_EMPTY , because there are going to be empty elements using your example string:

preg_split('/<br(\s\/)?>\K|\s/',$string,null,PREG_SPLIT_NO_EMPTY);

Update: To keep the <br /> , you need to reset the match using \\K . There is a good example on this in the language reference :

\\K can be used to reset the match start since PHP 5.2.4. For example, the pattern foo\\Kbar matches "foobar", but reports that it has matched "bar". The use of \\K does not interfere with the setting of captured substrings. For example, when the pattern (foo)\\Kbar matches "foobar", the first substring is still set to "foo".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM