简体   繁体   中英

scraping javascript data with simple_html_dom.php

i just get this string as result from scrap a script tag from external page with simple_html_dom.php

var secs = 0; 
var lastp = 0;
var newInstance = newObjce("xxx").setup(    
"more":[{.....}], 
"sources": [
{"file":"url1","label":"360p","default":"true"},
{"file":"url2","label":"480p"},
{"file":"url3","label":"720p"},
{"file":"url4","label":"1080p HD"}
], 
"morestuff":[{......}])

how can get the data between "sources"[ ..this data...] and asign in php variable? making var_dump to this always returning string object using json_encode dont work for me because after apply and make var_dump return always string object, this is why i think regexp can help me

i found a solution, i discovery this page to generate regexp online http://txt2re.com/index-php.php3 , leave here the function to solve my question, if anybody need this in the future

$re1='.*?'; # Non-greedy match on filler
$re2='("sources".*?\\[.*?\\])'; # Double Quote String 1
if ($c=preg_match_all ("/".$re1.$re2."/is", $string, $matches))
{
  $string1=$matches[1][0];
  print ($string1);
 }

What you're looking to get done can be done with regex, but it may not be the best option. For example, you can match between the first bracket "[" after sources and the stop matching after the next close bracket after the opening one. See https://regex101.com/r/mVVEGp/1 .

However, you're risking having trouble if there is ever a close bracket before you'd expect it (eg. inside a string). You may be better off just parsing the JSON using a proper parser. json_decode is a well established native PHP implementation. There are other implementations that allow reading JSON as a stream, which would work well for large sets of data.

The short of it is regex is likely not the best option in this use case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM