简体   繁体   中英

Import HTML page after full loading (to parsing)

I want read and parse page by url. It url I create dynamicaly. https://search.aviasales.ru/MOW2405CHI30061 (city from - deporting date - city to - date to - quantity). But it page does not load full. In first few second load only part. And if I try load it like this:

        System.Net.WebClient web = new System.Net.WebClient();
        web.Encoding = UTF8Encoding.UTF8;

        string str = web.DownloadString("https://search.aviasales.ru/MOW2405ATH30061");
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(str);

I get part of page. But I need other parts. It parts load after few time (angular scripts or partial views). How can I load complete page?

That page uses AJAX to load the data, so your code will only give you the basic container HTML, not the bit you want.

You'd have to study the source of the main page, read the Javascript and work out which AJAX call(s) it makes to get the data. You'll then need to call those URLs and parse the data.

This is known as "screen scraping" and has many pitfalls. It would be worth reading up on it, and making sure you know what you are letting yourself into, as you could invest a lot of work in scraping their page, only to have them make a simple change which completely breaks your code.

It would also be worth checking out if they have an API you can call, as that will be documented and resistant to change. The way you are trying to do it is very fragile.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM