简体   繁体   中英

regexp to split a string using comma(,) delimiter but ignore if the comma is in curly braces{,}

I need a regexp to split a string using comma(,) delimiter but ignore if the comma is in curly braces{,} in the example below;

"asd", domain={"id"="test"}, names={"index"="user.all", "show"="user.view"}, test="test"

INTO ( IT SHOULD BE)

"asd"
domain={"id"="test"}
names={"index"="user.all", "show"="user.view"}
test="test"

PROBLEM: (NOT THIS)

"asd"
domain={"id"="test"}
names={"index"="user.all"
"show"="user.view"}
test="test"

I tried this but it also splits comma inside braces {,}

\{[^}]*}|[^,]+

But I am totally clueless as to how this should pretty much end up. Any help would be appriciated!

You can use the following regex for splitting

(,)(?=(?:[^}]|{[^{]*})*$)

So using preg_split you can do it like as

echo preg_split('/(,)(?=(?:[^}]|{[^{]*})*$)/',$your_string);

Regex

I see to possibilities (that don't crash with a long string) :

The first with preg_match_all :

$pattern = '~
(?:
    \G(?!\A), # contigous to the previous match, not at the start of the string
  |           # OR
    \A ,??    # at the start of the string or after the first match when
              # it is empty
)\K           # discard characters on the left from match result
[^{,]*+       # all that is not a { or a ,
(?:
    {[^}]*}? [^{,]* # a string enclosed between curly brackets until a , or a {
                    # or an unclosed opening curly bracket until the end
)*+
~sx';

if (preg_match_all($pattern, $str, $m))
    print_r($m[0]);

The second with preg_split and backtracking control verbs to avoid parts enclosed between curly brackets (shorter, but less efficient with long strings) :

$pattern = '~{[^}]*}?(*SKIP)(*F)|,~';
print_r(preg_split($pattern, $str));

(*F) forces the pattern to fail and (*SKIP) forces the regex engine to skip parts already matched when the pattern fails.

The weakness of this last approach is that the pattern starts with an alternation. This means that for each character that is not a { or a , , the two branches of the alternation are tested (for nothing) . However, you can improve the pattern with the S (study) modifier:

$pattern = '~{[^}]*}?(*SKIP)(*F)|,~S';

or you can write it without an alternation, like this:

$pattern = '~[{,](?:(?<={)[^}]*}?(*SKIP)(*F))?~';

In this way, positions with a { or , are searched before with a faster algorithm than the normal walk of the regex engine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM