简体   繁体   中英

c# headless browser with javascript support for crawler

谁能建议支持.NET的无头浏览器,该浏览器支持Cookie和合法的javascript执行?

Selenium+HtmlUnitDriver/GhostDriver is exactly what you are looking for. Oversimplified, Selenium is library for using variety of browsers for automation purposes - testing, scraping, task automation.

There are different WebDriver classes with which you can operate an actual browser. HtmlUnitDriver is a headless one. GhostDriver is a WebDriver for PhantomJS, so you can write C# while actually PhantomJS will do the heavy lifting.

Code snippet from Selenium docs for Firefox, but code with GhostDriver (PhantomJS) or HtmlUnitDriver is almost identical.

using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;
using OpenQA.Selenium.Support.UI;

class GoogleSuggest
{
    static void Main(string[] args)
    {
        // driver initialization varies across different drivers
        // but they all support parameter-less constructors
        IWebDriver driver = new FirefoxDriver();
        driver.Navigate().GoToUrl("http://www.google.com/");


        IWebElement query = driver.FindElement(By.Name("q"));
        query.SendKeys("Cheese");
        query.Submit();

        WebDriverWait wait = new WebDriverWait(driver, TimeSpan.FromSeconds(10));
        wait.Until((d) => { return d.Title.ToLower().StartsWith("cheese"); });

        System.Console.WriteLine("Page title is: " + driver.Title);

        driver.Quit();
    }
}

If you run this on Windows machine you can use actual Firefox/Chrome driver because it will open an actual browser window which will operate as programmed in your C#. HtmlUnitDriver is the most lightweight and fast.

I have successfully ran Selenium for C# (FirefoxDriver) on Linux using Mono . I suppose HtmlUnitDriver will also work as fine as the others, so if you require speed - I suggest you go for Mono (you can develop, test and compile with Visual Studio on Windows, no problem) + Selenium HtmlUnitDriver running on Linux host without desktop.

I am not aware of a .NET based headless browser but there is always PhantomJS which is C/C++ and it works fairly well for assisting in unit testing of JS with QUnit.

There is also another relevant question here which might help you - Headless browser for C# (.NET)?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM