简体   繁体   中英

Split string on dots not preceded by a digit without losing digit in split

Given the following sentence:

The is 10. way of doing this. And this is 43. street.

I want preg_split() to give this:

Array (
 [0] => "This is 10. way of doing this"
 [1] => "And this is 43. street"
)

I am using:

preg_split("/[^\d+]\./i", $sentence)

But this gives me:

Array (
 [0] => "This is 10. way of doing thi"
 [1] => "And this is 43. stree"
)

As you can see, the last character of each sentence is removed. I know why this happens, but I don't know how to prevent it from happening. Any ideas? Can lookaheads and lookbehinds help here? I am not really familiar with those.

You want to use a negative assertion for that:

preg_split("/(?<!\d)\./i",$sentence)

The difference is that [^\\d]+ would become part of the match, and thus split would remove it. The (?! assertion is also matched, but is "zero-width", meaning it does not become part of the delimiter match, and thus won't be thrown away.

To explode your string on literal dots that are not preceded by a digit, match the non-digit, then reset the fullstring match with \\K (meaning "keep" from here), then match the "disposable" characters -- the literal dot and zero or more spaces.

Code: ( Demo )

$string = 'The is 10. way of doing this. And this is 43. street.';
var_export(
    preg_split('~\D\K\. *~', $string, 0, PREG_SPLIT_NO_EMPTY)
);

or ( Demo )

var_export(
    preg_split('~(?<!\d)\. *~', $string, 0, PREG_SPLIT_NO_EMPTY)
);

or ( Demo )

var_export(
    preg_split('~(?<=\D)\. *~', $string, 0, PREG_SPLIT_NO_EMPTY)
);

Output: (all clean, no trailing dots, no trailing spaces, no unexpected lost characters)

array (
  0 => 'The is 10. way of doing this',
  1 => 'And this is 43. street',
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM