Web scraping paginated page using HtmlAgilityPack

Question

I am working on creating web scraper using html agility pack and I have a question regarding pagination. I searched through out the web to find something to help me forward but I am no where near. I need to scrape the content of all the paginated page. Is there any mechanism to do that using htmlagility or any help is appreciated. I also found other application like selenium and looking into it. Is there a way i could utilizse selenium along with htmlagility to scrape as well? Any sort of help would be much appreciated. Thank you

Answer 1

Sure you can use HAP alongside Selenium. Basically, you can navigate to a URL using one of selenium driver, and then load the HTML into HAP, something like the following :

IWebDriver driver = new FirefoxDriver();
driver.Navigate().GoToUrl(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(driver.PageSource);

once you have done parsing current page, navigate the driver to the next page (locate the next page link and perform click action) and pass the HTML to HAP again. Anyway, I think most of HAP functionalities can be substituted by Selenium, so you may want to consider using Selenium only.

Web scraping paginated page using HtmlAgilityPack

Question

1 answers

solution1
1 2016-04-29 03:30:11

Web scraping paginated page using HtmlAgilityPack

Question

1 answers

solution1 1 2016-04-29 03:30:11

solution1
1 2016-04-29 03:30:11