简体   繁体   中英

how to get javascript code too with the actual source with Html Agility Pack

i am getting source of a website using Html Agility pack which is different than the code when i inspect with firebug.i have searched many things but still not getting clear of what i should do.Source is different than the code when i inspect please tell me how to get javascript code too with that Html. Even when i disable javascript in my browser i still cannot get the Javascript code along the source. i am using

string url="";
HtmlDocument doc = new HtmlDocument();
                WebClient client = new WebClient();
                html = client.DownloadString(url);
                doc.LoadHtml(html);

to get source tell me if i should need a request and response method to get JS code too.

To expand on @alecxe answer, you can use Selenium* to load your target page like a real browser would do, and then pass the result to HtmlAgilityPack for further processing :

using OpenQA.Selenium;

.....

IWebDriver driver = new PhantomJS.PhantomJSDriver();
driver.Navigate().GoToUrl(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(driver.PageSource);

alternatively, you can just run your query (XPath or CSS selector) using Selenium directly, for example :

var result = driver.FindElements(By.XPath("your query"));

//print HTML of the returned elements
foreach (var item in result)
{
    Console.WriteLine(item.GetAttribute("outerHTML"));
}

*) Need to download Selenium first, as well as the driver ie PhantomJS, Firefox, etc. Selenium can be installed to your project easily from NuGet .

For that you would need a real browser. Consider automating a browser (which can be headless - see PhantomJS ) with the help of selenium .

See also:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM