简体   繁体   English

Python Selenium AWS Lambda 将 WebGL 供应商/渲染器更改为不可检测的无头刮刀

[英]Python Selenium AWS Lambda Change WebGL Vendor/Renderer For Undetectable Headless Scraper

Concept:概念:

Using AWS Lambda functions with Python and Selenium, I want to create a undetectable headless chrome scraper by passing a headless chrome test .使用带有 Python 和 Selenium 的 AWS Lambda 函数,我想通过无头镀铬测试来创建无法检测到的无头镀铬刮刀。 I check the undetectability of my headless scraper by opening up the test and taking a screenshot.我通过打开测试并截取屏幕截图来检查无头刮刀的不可检测性。 I ran this test on a local IDE and on a Lambda server.我在本地 IDE 和 Lambda 服务器上运行了这个测试。


Implementation:执行:

I will be using a python library called selenium-stealth and will follow their basic configuration:我将使用一个名为selenium-stealth的 python 库,并将遵循它们的基本配置:

stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )

I implemented this configuration on a Local IDE as well as an AWS Lambda Server to compare the results.我在本地 IDE 以及 AWS Lambda 服务器上实现了此配置以比较结果。


Local IDE:本地 IDE:

Found below are the test results running on a local IDE:下面是在本地 IDE 上运行的测试结果: 在此处输入图像描述


Lambda Server: Lambda 服务器:

When I run this on a Lambda server, both the WebGL Vendor and Renderer are blank.当我在 Lambda 服务器上运行它时, WebGL 供应商和渲染器都是空白的。 as shown below:如下所示:

在此处输入图像描述

I even tried to manually change the WebGL Vendor/Renderer using the following JavaScript command:我什至尝试使用以下 JavaScript 命令手动更改 WebGL 供应商/渲染器:

driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {"source": "WebGLRenderingContext.prototype.getParameter = function(parameter) {if (parameter === 37445) {return 'VENDOR_INPUT';}if (parameter === 37446) {return 'RENDERER_INPUT';}return getParameter(parameter);};"})

Simply Put:简单的说:

Is it possible to add Vendor/Renderer on AWS Lambda ?是否可以在 AWS Lambda 上添加供应商/渲染器 In my efforts, it seems that there is no possible way.在我的努力下,似乎没有办法。 I made sure to submit this issue on the selenium-stealth GitHub Repository.我确保在 selenium-stealth GitHub Repository 上提交了这个问题

WebGL WebGL

WebGL is a cross-platform, open web standard for a low-level 3D graphics API based on OpenGL ES, exposed to ECMAScript via the HTML5 Canvas element. WebGL is a cross-platform, open web standard for a low-level 3D graphics API based on OpenGL ES, exposed to ECMAScript via the HTML5 Canvas element. WebGL at it's core is a Shader-based API using GLSL, with constructs that are semantically similar to those of the underlying OpenGL ES API. WebGL 的核心是使用 GLSL 的基于着色器的 API,其结构在语义上类似于底层 OpenGL ES API 的结构。 It follows the OpenGL ES specification, with some exceptions for the out of memory-managed languages such as JavaScript.它遵循 OpenGL ES 规范,但内存管理不足的语言有一些例外,例如 JavaScript。 WebGL 1.0 exposes the OpenGL ES 2.0 feature set; WebGL 1.0 公开了 OpenGL ES 2.0 功能集; WebGL 2.0 exposes the OpenGL ES 3.0 API. WebGL 2.0 公开了 OpenGL ES 3.0 API。

Now, with the availability of Selenium Stealth building of Undetectable Scraper using Selenium driven ChromeDriver initiated Browsing Context have become much more easier.现在,随着 Selenium 的推出,使用Selenium驱动的ChromeDriver发起浏览上下文隐形构建不可检测的刮板变得更加容易。


selenium-stealth硒隐形

selenium-stealth is a python package selenium-stealth to prevent detection. selenium-stealth是 python package selenium-stealth 以防止检测。 This programme tries to make python selenium more stealthy.该程序试图使 python selenium 更加隐蔽。 However, as of now selenium-stealth only support Selenium Chrome.但是,截至目前 selenium-stealth 仅支持 Selenium Chrome。

  • Code Block:代码块:

     from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from selenium_stealth import stealth options = Options() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) s = Service('C:\\BrowserDrivers\\chromedriver.exe') driver = webdriver.Chrome(service=s, options=options) # Selenium Stealth settings stealth(driver, languages=["en-US", "en"], vendor="Google Inc.", platform="Win32", webgl_vendor="Intel Inc.", renderer="Intel Iris OpenGL Engine", fix_hairline=True, ) driver.get("https://bot.sannysoft.com/")
  • Browser Screenshot:浏览器截图:

bot_sannysoft


Changing WebGL Vendor/Renderer in AWS Lambda在 AWS Lambda 中更改 WebGL 供应商/渲染器

AWS Lambda enables us to deliver compressed WebGL websites to end users . AWS Lambda 使我们能够将压缩的 WebGL 网站交付给最终用户 When requested webpage objects are compressed, the transfer size is reduced, leading to faster downloads, lower cloud storage fees, and lower data transfer fees.当请求的网页对象被压缩时,传输大小会减小,从而导致更快的下载、更低的云存储费用和更低的数据传输费用。 Improved load times also directly influence the viewer experience and retention, which helps in improving website conversion and discoverability.改进的加载时间还直接影响查看者的体验和留存率,这有助于提高网站的转化率和可发现性。 Using WebGL, websites are more immersive while still being accessible via a browser URL.使用 WebGL,网站更加身临其境,同时仍可通过浏览器 URL 访问。 Through this technique AWS Lambda to automatically compress the objects uploaded to S3.通过这种技术 AWS Lambda 自动压缩上传到 S3 的对象。

product-page-diagram_Lambda-RealTimeFileProcessing.a59577de4b6471674a540b878b0b684e0249a18c

Background on compression and WebGL压缩和 WebGL 的背景

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization. HTTP 压缩功能可以内置到 web 服务器和 web 客户端中,以提高传输速度和带宽利用率。 This capability is negotiated between the server and the client using an HTTP header which may indicate that a resource being transferred, cached, or otherwise referenced is compressed.此功能在服务器和客户端之间使用 HTTP header 协商,这可能表明正在传输、缓存或以其他方式引用的资源已压缩。 AWS Lambda on the server-side supports Content-Encoding header.服务器端的 AWS Lambda 支持内容编码 header。

On the client-side, most browsers today support brotli and gzip compression through HTTP headers ( Accept-Encoding: deflate, br, gzip ) and can handle server response headers.在客户端,当今大多数浏览器通过 HTTP 标头( Accept-Encoding: deflate, br, gzip )支持 brotli 和 gzip 压缩,并且可以处理服务器响应标头。 This means browsers will automatically download and decompress content from a web server at the client-side, before rendering webpages to the viewer.这意味着在将网页呈现给查看器之前,浏览器将自动从客户端的 web 服务器下载和解压缩内容。


Conclusion结论

Due to this constraint you may not be able to change the WebGL Vendor/Renderer in AWS Lambda, else it may directly affect the process of rendering webpages to the viewers and can stand out to be a bottleneck in UX.由于此限制,您可能无法更改 AWS Lambda 中的 WebGL Vendor/Renderer,否则它可能会直接影响向查看者呈现网页的过程,并可能成为 UX 的瓶颈。


tl; tl; dr博士

You can find a couple of relevant detailed discussion in:您可以在以下位置找到一些相关的详细讨论:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM