简体   繁体   中英

Getting nested Divs using HTML Agility Pack c#

I'm trying to scrape a webpage (Pub Med) to see how many references appear in specific articles(some articles have references, some don't). However, the problem i'm having right now is that the divs are all nested and named the same thing so I haven't been able to figure out what code is required to get the elements.

So far i've tried using contains to see if I could just grab a catch all and dig my way into the node from there but that hasn't worked.

.SelectNodes("//div[contains(@class,'portlet_title')]");

I also have tried copying the XPath but all I would get is null as a result

.SelectNodes("//*[@id="disc_col"]/div[3]/div[1]/div/h3/span");

Any help would be appreciated as I am no master at Xpath.
And for reference, a page that would fit my criteria is: http://www.ncbi.nlm.nih.gov/pubmed/?term=23489346 (right hand side says Cited by * articles).

I've also browsed some other responses however they all seemed to be for results with differently named Divs ( ie get all the divs ids on a html page using Html Agility Pack ). Either I dont understand how to use this correctly, or my problem is different.

Thanks again.

Mike! Try use

    var titles = website.DocumentNode.SelectNodes("//div[@class='portlet_title']");

The errors in your XPaths are: 1. attributes are written just in "[]" with "@" symbol like I wrote; 2. in every XPath node you should write an index eg "//div[3]/div[1]/div /h3 /span ". / h3 / span ”。

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM