简体   繁体   中英

Multiple patterns within regex

I have a json and I need to match all "text" keys as well as the "html" keys.

For example, the json could be like below:

[{
"layout":12,
"text":"Lorem",
"html":"<div>Ipsum</div>"
}]

Or it could be like below:

[{
"layout":12,
"settings":{
    "text":"Lorem",
    "atts":{
        "html":"<div>Ipsum</div>"
    }
}
}]

The json is not always using the same structure so I have to match the keys and get their values using preg_match_all . I have tried the following to get the value of the "text" key:

preg_match_all('|"text":"([^"]*)"|',$json,$match_txt,PREG_SET_ORDER);

The above works fine for matching a single key. When it comes to matching a second key ("html" in this case) it just doesn't work. I have tried the following:

preg_match_all('|"text|html":"([^"]*)"|',$json,$match_txt,PREG_SET_ORDER);

Can you please give me some hints why the OR operator (text|html) doesn't work? Strangely, the above (multi-pattern) regex works fine when I test it in an online tester but it doesn't work in my php files.

Fixing text|html

You should add text|html to a group, otherwise it will look for "text or html" .

|"(text|html)":"([^"]*)"|

Delimiters

This won't currently work with your delimiters though as you use the pipe ( | ) inside of the expression. You should change your delimiters to something else, here I've used / .

/"(text|html)":"([^"]*)"/

If you still want to use the pipe as your delimiters, you should escape the pipe within the expression.

|"(text\|html)":"([^"]*)"|

If you don't want to manually escape it, preg_quote() can do it for you.

$exp = preg_quote('"(text|html)":"([^"]*)"');
preg_match_all("|{$exp}|",$json,$match_txt,PREG_SET_ORDER);

Parsing JSON

Although that regex will work, it will need additional parsing and it makes more sense to use a recursive function for this.

json_decode() will decode a JSON string into the relative data types. In the example below I've passed an additional argument true which means I will get an associative array where you would normally get an object .

Once findKeyData() is called, it will recursively call itself and work through all of the data until it finds the specified key. If not, it returns null .

function findKeyData($data, $key) {
    foreach ($data as $k => $v) {
        if (is_array($v)) {
            $data = findKeyData($v, $key);
            if (! is_null($data)) {
                return $data;
            }
        }
        if ($k == $key) {
            return $v;
        }
    }
    return null;
}

$json1 = json_decode('[{
"layout":12,
    "text":"Lorem",
    "html":"<div>Ipsum</div>"
    }]', true);
$json2 = json_decode('[{
"layout":12,
    "settings":{
    "text":"Lorem",
    "atts":{
        "html":"<div>Ipsum</div>"
    }
}
}]', true);

var_dump(findKeyData($json1, 'text')); // Lorem
var_dump(findKeyData($json1, 'html')); // <div>Ipsum</div>
var_dump(findKeyData($json2, 'text')); // Lorem
var_dump(findKeyData($json2, 'html')); // <div>Ipsum</div>
preg_match_all('/"(?:text|html)":"([^"]*)"/',$json,$match_txt,PREG_SET_ORDER);

print $match_txt[0][0]." with group 1: ".$match_txt[0][1]."\n";
print $match_txt[1][0]." with group 1: ".$match_txt[1][1]."\n";

returns:

$ php -f test.php
"text":"Lorem" with group 1: Lorem
"html":"<div>Ipsum</div>" with group 1: <div>Ipsum</div>

The enclosing parentheses are needed : (?:text|html) ; I couldn't get it to work on https://regex101.com without. ?: means the content of the parentheses will not be captured (ie, not available in the results).

I also replaced the pipe ( | ) delimiter with forward slashes since you also have a pipe inside the regex. Another option is to escape the pipe inside the regex: |"(?:text\\|html)":"([^"]*)"| .

I don't see any reason to use a regex to parse a valid json string:

array_walk_recursive(json_decode($json, true), function ($v, $k) {
    if ( in_array($k, ['text', 'html']) )
        echo "$k -> $v\n";
});

demo

You use the Pipe | character as delimiter, I think this will break your regexp. Does it work using another delimiter like

preg_match_all('#"text|html":"([^"]*)"#',$json,$match_txt,PREG_SET_ORDER);

?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM