简体繁体 English

从 web 页面中获取 javascript 变量，无需交互/粗心大意

[英]Get a javascript variable from a web page without interaction/heedlessly

原文 2022-11-24 17:48:24 7 2 javascript/ api/ headless

Good afternoon, We're looking to get a javascript variable from a webpage, that we are usually able to retrieve typing app in the Chrome DevTools.下午好，我们希望从网页中获取 javascript 变量，我们通常能够在 Chrome DevTools 中检索打字app 。

However, we're looking to realize this headlessly as it has to be performed on numerous apps.但是，我们希望无头地实现这一点，因为它必须在众多应用程序上执行。

Our ideas:我们的想法：

Using a Puppeteer instance to go on the page, type the command and return the variable, which works, but it's very ressource consuming.在页面上使用 Puppeteer 实例到 go，键入命令并返回变量，这有效，但非常耗资源。
Using a GET/POST request to the page trying to inject the JS command, but we didn't succeed.对页面使用 GET/POST 请求试图注入 JS 命令，但我们没有成功。

We're then wondering if there will be an easier solution, as a special API that could extract the variable?然后我们想知道是否会有更简单的解决方案，例如可以提取变量的特殊 API？ The goal would be to automate this process with no human interaction.目标是在没有人工交互的情况下自动执行此过程。

Thanks for your help!谢谢你的帮助！

2 个解决方案

You can embed Chrome into your application and instrument it.您可以将 Chrome 嵌入到您的应用程序中并对其进行检测。 It will be headless.它将是无头的。
We've used this approach in the past to copy content from PowerPoint Online.我们过去曾使用这种方法从 PowerPoint Online 复制内容。

We were using .NET to do this and therefore used CEFSharp.我们使用 .NET 来执行此操作，因此使用了 CEFSharp。

Your question is not so much about a JS API (since the webpage is not yours to edit, you can only request it) as it is about webcrawling / browser automation.您的问题与其说是关于 JS API（因为该网页不是您可以编辑的，您只能请求它），不如说是关于网络爬虫/浏览器自动化。

You have to add details to get a definitive answer, but I see two scenarios:您必须添加详细信息才能获得明确的答案，但我看到两种情况：

the website actively checks for evidence of human browsing (for example, it sits behind CloudFlare and has requested this option);该网站主动检查人类浏览的证据（例如，它位于 CloudFlare 后面并已请求此选项）； or the scripts depend heavily on there being a browser execution environment available.或者脚本在很大程度上取决于是否有可用的浏览器执行环境。 In this case, the simplest option is to automate a browser, because a headless option has to get many things right to fool the server or the scripts.在这种情况下，最简单的选择是使浏览器自动化，因为无头选项必须正确处理许多事情才能欺骗服务器或脚本。 I would use karate , which is easier than, say, selenium and can execute in-browser scripts .我会使用空手道，它比 selenium 更容易，并且可以执行浏览器内脚本。 It is written in Java, but you can execute it externally and just read its reports.它写在 Java 中，但你可以在外部执行它并只读它的报告。
the website does not check for such evidence and the scripts do not really require a browser execution environment.该网站不会检查此类证据，脚本也不需要浏览器执行环境。 Then you can simply download everything requires locally and attempt to jury-rig the JS into executing in any JS environment.然后你可以简单地在本地下载所有需要的东西，并尝试临时安装 JS 使其在任何 JS 环境中执行。 According to your post, this fails;根据您的帖子，这失败了； but it is impossible to help unless you can describe how it fails.但除非你能描述它是如何失败的，否则它是不可能提供帮助的。 This option can be headless.这个选项可以是无头的。