简体   繁体   中英

PHP Regular Expression to extract JSON data

I have the following string:

window['test'] = false; 
window['options'] = true; 
window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"};`

How would I go about extracting the JSON data in window['data'] ? The example data I provided is just a small sample of what really exists. There could be more data before and/or after window['data'] .

I've tried this but had no luck:

preg_match( '#window["test"] = (.*?);\s*$#m', $html, $matches );

There are several issues that I can see.

  1. Your string uses single quotes: window['test'] not window["test"] , which you have in your regular expression. This means you should use double quotes to enclose your regular expression (or escape the quotes).

  2. Your regular expression has unescaped brackets, which is used to create a character class. You should use \\[ instead of just [ .

  3. You say you are looking for data but your regular expression looks for test .

  4. You have a $ at the end of the regular expression, which means you won't match if there is nothing other than whitespace after the bit you matched.

Also your data seems incomplete, there are some missing brackets at the end, but I think that is just a copy-paste error.

So I would try:

php > preg_match("#window\\['data'\\]\\s*=\\s*(.*?);#", $html, $matches); php > print_r($matches); Array ( [0] => window['data'] = {"id":2345,"stuff":[{"id":704,"name":"test"}; [1] => {"id":2345,"stuff":[{"id":704,"name":"test"} )

Of course then you must use json_decode() to convert the JSON string ( $matches[1] ) into an object or associative array that you can use.

You can use this regex:

window\['data'\]\s*=\s*(.*?);

Working demo

在此处输入图片说明

The match information is:

MATCH 1
1.  [67-111]    `{"id":2345,"stuff":[{"id":704,"name":"test"}`

As regex101 suggests you could have a code like this:

$re = "/window\\['data'\\]\\s*=\\s*(.*);/"; 
$str = "window['test'] = false; window['options'] = true; window['data'] = {\"id\":2345,\"stuff\":[{\"id\":704,\"name\":\"test\"};"; 

preg_match_all($re, $str, $matches);

You can parse the window data with the regular expression:

/^window\[['"](\w+)['"]\]\s*=\s*(.+);\s*$/m

Then you can retrieve the pieces by their original index in the window data structures, and parse the JSON at your leisure.

$data = <<<_E_
window['test'] = false;
window['options'] = true;
window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"}]};
_E_;
$regex = <<<_E_
/^window\[['"](\w+)['"]\]\s*=\s*(.+);\s*$/m
_E_; // SO syntax highlighting doesnt like HEREDOCs "

if( preg_match_all($regex,$data,$matches) > 0 ) {
    var_dump($matches);
    $index = array_search('data',$matches[1]);
    if( $index !== 0 ) {
        var_dump(json_decode($matches[2][$index]));
    } else { echo 'no data section'; }
} else { echo 'no matches'; }

Output:

// $matches
array(3) {
  [0]=>
  array(3) {
    [0]=>    string(24) "window['test'] = false; "
    [1]=>    string(26) "window['options'] = true; "
    [2]=>    string(69) "window['data'] = { "id" : 2345, "stuff": [{"id":704,"name":"test"}]};"
  }
  [1]=>
  array(3) {
    [0]=>    string(4) "test"
    [1]=>    string(7) "options"
    [2]=>    string(4) "data"
  }
  [2]=>
  array(3) {
    [0]=>    string(5) "false"
    [1]=>    string(4) "true"
    [2]=>    string(51) "{ "id" : 2345, "stuff": [{"id":704,"name":"test"}]}"
  }
}
// decoded JSON
object(stdClass)#1 (2) {
  ["id"]=>      int(2345)
  ["stuff"]=>
  array(1) {
    [0]=>
    object(stdClass)#2 (2) {
      ["id"]=>      int(704)
      ["name"]=>    string(4) "test"
    }
  }
}

Note: I fixed the JSON in your example to be valid so it would actually parse.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM