简体   繁体   中英

Regular Expression to Extract Javascript Object from Scraped HTML

I have a full page HTML scraped that have a lot of markup including HTML/CSS/JS code.

example below (stripped content)

<p>blah blah blah html</p>
<script type="text/javascript">window._userData ={"country_code": "PK", "language_code": "en",user:[{"user": {"username": "johndoe", "follows":12,"biography":"blah blah blah","feedback_score":99}}],"another_var":"another value"} </script>
<script> //multiple script tags can be here... </script>
<p>blah blah blah html</p>

Now I want to extract the object in window._userData and then if possible convert that extracted string into PHP object/array.

I have tried a few regular expressions found on SO but couldn't get it working.

I have also tried the similar answer here Regular expression extract a JavaScript variable in PHP

Thanks

find by regex

preg_match('/\bwindow\._userData\s*=(.+)(?=;|<\/script)/', $html, $m);

and decode

json_decode(trim($m[1]), true);

But before you should make correct json in that html.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM