简体   繁体   中英

Scraping data using HTMLAgilityPack

In HTMLAgailityPack, how to get the data from the website which is not coming in the innerhtml method of it. For example, if in the link below:

https://www.theice.com/productguide/ProductSpec.shtml?specId=1496#expiry

The table starting with contract symbol is not coming in the innerhtmltext. Please let me know how to get this table data through HTMLAgailityPack?

Regards

You need to send a GET request to https://www.theice.com/productguide/ProductSpec.shtml?expiryDates=&specId=1496&_=1342907196619

The content is being loaded dynamically via javascript. Perhaps you can parse the innerhtmltext to see what link the javascript will send the GET request to

If its not 'coming in the innerhtml' that would mean that its being put in there by a script. I'm not able to check this page myself so I'm not sure.

If its coming from a script, you can't get it very easily. You can play around viewing the javascript and maybe being able to read the data as its coming in.

Basically install Firebug on your browser, and look at the data transfers being made. Sometimes you're lucky, sometimes you're not.

Or you can take the simple method and use the winforms WebBrowser control, load it in it, let it run the script then scrape from there. Note that this will leak memory and GDI handles like crazy.

Pleae use this XPath to get that table you want //*[@id="right"]/div/table

eg

HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[@id="right"]/div/table"));
string html = node.InnerHtml;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM