简体   繁体   中英

How to load HTML after content has been loaded

I am trying to get a list of the content on a website ( this one if anyone is interested). The layout has changed recently and now they do not load the content all at once, but with magic (js probably). I'm currently using JSoup to analyze the HTML, but im open to suggestions.

This is what i am getting:

<div class="row" data-v-6e4dbe9e>
 <div class="col-17 podcasts-group" data-v-6e4dbe9e>
  <div class="loading-spinner" data-v-6e4dbe9e>      //the devil himself
   <div class="spinner" data-v-ac3cb376 data-v-6e4dbe9e>
    <div class="rect1" data-v-ac3cb376></div>
    <div class="rect2" data-v-ac3cb376></div>
    <div class="rect3" data-v-ac3cb376></div>
    <div class="rect4" data-v-ac3cb376></div>
    <div class="rect5" data-v-ac3cb376></div>
   </div>
  </div>
  <div mode="in-out" class="transition-group row" data-v-6e4dbe9e>
   //Here should be stuff!
  </div>
 </div>
</div>

the code that achieves this:

String selector = "div.podcasts-items";
Elements elem = Jsoup.connect(link).get().select(selector)
System.out.println("html: "+elem.html());

This is what i would like to see (copied from inspect element after the page has loaded all the content):

<div class="row" data-v-6e4dbe9e>
 <div class="col-17 podcasts-group" data-v-6e4dbe9e>
  <!---->  //begone evil!
  <div mode="in-out" class="transition-group row" data-v-6e4dbe9e>
   <div class="col-17 col-md-8 center-margin" data-v-6e4dbe9e="">...</div>
   <div class="col-17 col-md-8 center-margin" data-v-6e4dbe9e="">...</div>
   <div class="col-17 col-md-8 center-margin" data-v-6e4dbe9e="">...</div>
   <div class="col-17 col-md-8 center-margin" data-v-6e4dbe9e="">...</div>
  </div>
 </div>
</div>

Google doesn't help much, because every content related to spinners etc. is about javascript.

solution:

due to the fact that JSoup only loads the HTML and does not execute any javascript the page never had a chance to load the content. You would have to use an actual browser engine or a webdriver like selenium to get the data to load.

For this specific problem I was able to get the content directly via loading the Json data through this webpage's API.

If I understood your question then your best bet is to use Selenium driver. Link to similar question

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM