简体   繁体   中英

Web scraping paginated page using HtmlAgilityPack

I am working on creating web scraper using html agility pack and I have a question regarding pagination. I searched through out the web to find something to help me forward but I am no where near. I need to scrape the content of all the paginated page. Is there any mechanism to do that using htmlagility or any help is appreciated. I also found other application like selenium and looking into it. Is there a way i could utilizse selenium along with htmlagility to scrape as well? Any sort of help would be much appreciated. Thank you

Sure you can use HAP alongside Selenium. Basically, you can navigate to a URL using one of selenium driver, and then load the HTML into HAP, something like the following :

IWebDriver driver = new FirefoxDriver();
driver.Navigate().GoToUrl(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(driver.PageSource);

once you have done parsing current page, navigate the driver to the next page (locate the next page link and perform click action) and pass the HTML to HAP again. Anyway, I think most of HAP functionalities can be substituted by Selenium, so you may want to consider using Selenium only.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM