简体   繁体   中英

How to parse a widget that returns javascript to build DOM instead of HTML?

We are provided a widget that prints a playlist of songs provided by NPR. We need to fetch, programmatically every few seconds, the currently airing song, the one at the top of the list, and then save it to a file or something. Easy right? wget http://composer.nprstations.org/asasfasdfasf | sed 'find/it' > nowplaying.txt wget http://composer.nprstations.org/asasfasdfasf | sed 'find/it' > nowplaying.txt or something to that effect. Could be python, bash, pascal, whatever.

HOWEVER.. upon further inspection, the widget does not simply return the HTML, it returns a bunch of javascript that then builds the DOM. So it's client side logic that builds the playlist, not server side, or maybe both. Once the page is loaded, I can just as easily get the text we need, typing something like $('.now-playing a').html()[0] in the console would do the trick, but I don't know how we could do this programmatically.

So we need a browser with a javascript engine to first process the page before we can parse it. What would you recommend we do??

Thanks for any help!!

this is the entire response from the widget server:

<head>
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
  <script src="/widgets/src/whatson.js"></script>
  <script src="/widgets/src/jquery-deparam.js"></script>
  <script>
    var configuration = {
      schedule: 'now'
    };
    if ( window.location.search ) configuration = $.extend({}, configuration, $.deparam( (window.location.search).replace('?','') ));
    $(function(){ 
      $('.now-playing').whatsOn( configuration ); 
    }); 
   </script>
</head>
<body>
  <div class="now-playing"></div>
</body>
</html>

See, from what you posted, the page runs the whatsOn JS function.

As you told, in commnets it connects to the API with the your station ID .

So, the next step is simulating the same - you can try connect with the wget , or some scripting language, to the same API with the same "station ID" as JS does, and analyse what you got.

And repeat the same with the next step, if needed.

It is impossible to solve the problem without access...

Alternatively, you can check the WWW::Mechanize::Firefox perl module. Here are some examples and an cookbook ;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM