简体   繁体   English

无需安装浏览器即可在 C# 中抓取 javascript 生成的网站

[英]Scrape a javascript-generated website in C# without installing a browser

I am developing a website crawler API to scrape a javascript-generated website.我正在开发一个网站爬虫 API 来抓取一个 javascript 生成的网站。 The website that we are crawling requires the Javascript to be enabled to fully-render the HTML.我们正在抓取的网站需要启用 Javascript 才能完全呈现 HTML。 I have tried many solutions such as HtmlAgilityPack and AngleSharp, but they are just HTML parsers and they cannot render the page due to missing Javascript capability.我尝试了很多解决方案,例如 HtmlAgilityPack 和 AngleSharp,但它们只是 HTML 解析器,由于缺少 Javascript 功能而无法呈现页面。

I tried implementing headless browser using Selenium.WebDriver.ChromeDriver, it worked very well in my local machine.我尝试使用 Selenium.WebDriver.ChromeDriver 实现无头浏览器,它在我的本地机器上运行良好。 However, our production environment is very limited such that only Internet Explorer browser is available and we are not allowed to install any more browser.但是,我们的生产环境非常有限,只能使用 Internet Explorer 浏览器,不允许再安装任何浏览器。 So this chromedriver did not work, too.所以这个 chromedriver 也不起作用。 Internet Explorer cannot even fully render the website from the browser itself. Internet Explorer 甚至无法从浏览器本身完全呈现网站。 So IE is definitely out.所以IE肯定是out了。

Is there a way to scrape a javascript-generated website without having to install a browser?有没有办法在不安装浏览器的情况下抓取 javascript 生成的网站? Like implementing a headless browser on a server without that browser installed?就像在没有安装浏览器的服务器上实现无头浏览器一样? Or is it a dead-end situation.或者这是一个死胡同。 Thanks!谢谢!

You can try using a solution that uses a fully-functional built-in Chromium and doesn't require installing Google Chrome in the target environment.您可以尝试使用使用功能齐全的内置 Chromium 并且不需要在目标环境中安装 Google Chrome 的解决方案。 All the required Chromium binaries will be shipped with the solution.所有必需的 Chromium 二进制文件都将随解决方案一起提供。

There are many such solutions for .NET and C#: .NET 和 C# 有很多这样的解决方案:

CefSharp夏普

An open source .NET wrapper around the Chromium Embedded Framework (CEF).围绕 Chromium 嵌入式框架 (CEF) 的开源 .NET 包装器。 It allows you to embed Chromium in .NET apps.它允许您在 .NET 应用程序中嵌入 Chromium。

Supported by community.得到社区的支持。 If you need help with the library use, read docs or ask community.如果您需要图书馆使用方面的帮助,请阅读文档或询问社区。 If you need a feature or a bug fix, you would probably need to do it by yourself.如果您需要功能或错误修复,您可能需要自己完成。

DotNetBrowser点网浏览器

A commercial library that allows integrating a Chromium-based browser with your .NET app to display and process HTML5, CSS3, JavaScript, etc.一个商业库,允许将基于 Chromium 的浏览器与您的 .NET 应用程序集成,以显示和处理 HTML5、CSS3、JavaScript 等。

It's a proprietary solution supported by a commercial company.这是商业公司支持的专有解决方案。 If you need help with the library use, read docs or get help from the engineers of this product.如果您需要库使用方面的帮助,请阅读文档或从该产品的工程师那里获得帮助。 If you need a feature or a bug fix, it will be done by the product team as soon as possible.如果您需要功能或错误修复,产品团队将尽快完成。 I know that, because I know the engineers from DotNetBrowser team.我知道这一点,因为我认识 DotNetBrowser 团队的工程师。

WebView2 网页视图2

This control allows you to embed web technologies (HTML, CSS, and JavaScript) in your native apps.此控件允许您在本机应用程序中嵌入 Web 技术(HTML、CSS 和 JavaScript)。 The WebView2 control uses Microsoft Edge (Chromium) as the rendering engine to display the web content in native apps. WebView2控件使用 Microsoft Edge (Chromium) 作为呈现引擎在本机应用程序中显示 Web 内容。 With WebView2 , you can embed web code in different parts of your native app, or build all of the native app within a single WebView instance.使用WebView2 ,您可以将 Web 代码嵌入本机应用程序的不同部分,或在单个WebView实例中构建所有本机应用程序。 Supported by Microsoft.由微软支持。

If you need some help, you should contact WebView2 team .如果您需要帮助,请联系 WebView2 团队

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM