[英]Web Scraping in C# just isn't working as expected
我一直在尝试从某个 web 站点刮取一些数据,但似乎我的代码没有按预期运行。 只是没有让我获得 html 页面。
public Scraper()
{
BGWorker.DoWork += GetHtml;
BGWorker.RunWorkerAsync();
}
static void GetHtml(object sender, DoWorkEventArgs e)
{
System.Threading.Thread.Sleep(1);
Console.WriteLine("Downloading Data...");
ScrapingBrowser _ScrapingBrowser = new ScrapingBrowser();
WebPage webPage = _ScrapingBrowser.NavigateToPage(new Uri("https://www.goodwebsite.com"));
Console.WriteLine(webPage.Html);
Console.WriteLine("Got the Data");
}
如果您使用的是 ScrapySharp,请确保它是更新的(最新)版本。
要显示您在webPage.Html之后缺少的.InnerHtml
网页的 HTML 缺少webPage.Html
:
static void GetHtml(object sender, DoWorkEventArgs e)
{
System.Threading.Thread.Sleep(1);
Console.WriteLine("Downloading Data...");
ScrapingBrowser _ScrapingBrowser = new ScrapingBrowser();
WebPage webPage = _ScrapingBrowser.NavigateToPage(new Uri("https://www.goodwebsite.com"));
Console.WriteLine(webPage.Html.InnerHtml);
Console.WriteLine("Got the Data");
}
基本上你只是输出 object 类型而不是值。
您可以使用 C# Selenium,只需从 NuGet ZEFE90A8E604A7C840E88D03A67 管理器安装它。 它看起来像这样:
var chromeOptions = new ChromeOptions();
chromeOptions.AddArgument("headless");
ChromeDriver driver = new ChromeDriver(chromeOptions);
driver.Navigate().GoToUrl("https://www.bikes.com/");
var source = driver.PageSource;
Console.WriteLine(source);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.