简体   繁体   中英

JSoup can't retrieve this doc

Here is the url: http://immobilier.nc/recherche?section=offres_vente&bien=&prix_location=&prix_vente=&pays=nc&ville=&quartier=&par_page=25&orderBy=&orderDirection=DESC&moteurRecherche_option=last_offr

Here is my code:

Jsoup.connect(url)
                   .timeout(10000)
                   .followRedirects(true)
                   .validateTLSCertificates(false)
                   .get();

The problem is that I get a different page than the one on the browser.

For example, this tag is not in the Jsoup Doc (but is in the browser):

<tr style="cursor:pointer;" id="235005" class="showOffre setPushStat ajax" href="menu_detail_offre.php?checksum=IM-O-58cf724c03e64" data-divdest="detail_235005" data-godiv="detail_235005" data-pushstat_url="!O-235005">
            <td align="left" style="vertical-align:middle"><img src="/photos.immobilier.nc//gw/2017/4/_thumbs/bb3dfed8-66f6-4a6b-939a-a47b70c998ba.jpeg" width="100"></td>
            <td nowrap="" align="left" style="vertical-align:middle"> 235005</td>
            <td align="left" style="vertical-align:middle">Vente</td>
            <td align="left" style="vertical-align:middle"><img src="http://immobilier.nc/images/part_promobat_mini.jpg" style="display: none !important;"> </td>
            <td align="left" style="vertical-align:middle">Appartement</td>
            <td align="left" style="vertical-align:middle">F3</td>
            <td align="left" style="vertical-align:middle">Nouméa</td>
            <td align="left" style="vertical-align:middle">Ouémo</td>
            <td nowrap="" align="left" style="vertical-align:middle">35.278 U</td>
            <td align="left" style="vertical-align:middle">17/04/2017</td>
          </tr>

the part you show that is missing from the first request is the content of the table containing the offers. This table is loaded from the webpage by an ajax call to http://immobilier.nc/immo_offres.php and then is integrated into the displayed webpage.

Jsoup loads the same content from the url you show as the browser, but only the first page. Jsoup does not interpret the javascript code in the page and does not do additional loading of additional data and so you do not get the same content that you have when the browser has loaded the page and filled it with the results from additional ajax calls.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM