简体   繁体   English

Headless Chrome - 获取空白页面源

[英]Headless Chrome - getting blank page source

I'm trying to load a website with Chrome browser in headless mode using Selenium web driver.我正在尝试使用 Selenium Web 驱动程序以无头模式加载带有 Chrome 浏览器的网站。 I face an issue with some specific websites.我在某些特定网站上遇到了问题。 The page is loading, in the first 2-3 seconds it shows a page with "please enable javascript..." and after 3 seconds, page source goes blank.页面正在加载,在前 2-3 秒内显示一个带有“请启用 javascript...”的页面,3 秒后,页面源变为空白。

I'm using Selenium and especially Chrome for long time and I am familiar with the platform.我长期使用 Selenium,尤其是 Chrome,并且我熟悉该平台。 For the purpose of this case, I'm using Chrome Version 73.0.3683.86 , ChromeDriver 2.46.628411 (which is compatible according to Which ChromeDriver version is compatible with which Chrome Browser version? ) on a Mac OS.出于本案例的目的,我在 Mac 操作系统上使用 Chrome 版本 73.0.3683.86 、ChromeDriver 2.46.628411(根据哪个 ChromeDriver 版本与哪个 Chrome 浏览器版本兼容? )。 selenium java version is latest - 3.141.59 selenium java 版本是最新的 - 3.141.59

I suspect that headless Chrome cannot handle specific content-type such as "svg" and any other GUI related HTTP response.我怀疑无头 Chrome 无法处理特定的内容类型,例如“svg”和任何其他与 GUI 相关的 HTTP 响应。

    ChromeOptions chromeOptions = new ChromeOptions();
    chromeOptions.addArguments("--headless");
    WebDriver driver = new ChromeDriver(chromeOptions);

    driver.get("https://identity.tescobank.com/login");

    Thread.sleep(3000);

    System.out.println(driver.getPageSource());

    driver.quit();

Expected result is to have the page source same as it is showing in non-headless mode.预期结果是页面源与在非无头模式下显示的页面源相同。

Headless Chrome should be able to handle everything the normal Chrome can do: 无头Chrome应该能够处理普通Chrome可以执行的所有操作:

It brings all modern web platform features provided by Chromium and the Blink rendering engine to the command line. 它将Chromium和Blink渲染引擎提供的所有现代Web平台功能引入命令行。

(see https://developers.google.com/web/updates/2017/04/headless-chrome ) (请参阅https://developers.google.com/web/updates/2017/04/headless-chrome

Since only the login page of a bank causes you trouble, my guess is that the security of the page detects an anomaly and decides not to serve you. 由于只有银行的登录页面会给您带来麻烦,我的猜测是页面的安全性检测到异常并决定不为您服务。

One way they can do that is by looking at the User Agent string which contains HeadlessChrome . 他们可以这样做的一种方法是查看包含HeadlessChrome的User Agent字符串。

That said, unless you're writing integration tests for the bank, your behavior is at least suspicious. 也就是说,除非您正在为银行编写集成测试,否则您的行为至少是可疑的。 If you have a valid and legal concern, clear it with the bank first. 如果您有合法且合法的问题,请先向银行清除。 They might take actions against you, otherwise. 否则他们可能会对你采取行动。 Blocking your IP address (which could affect many people) or asking the police to have a word with you. 阻止您的IP地址(可能会影响很多人)或要求警方与您说话。

I was facing similar issue in my script, after login.登录后,我的脚本中遇到了类似的问题。 Somehow refreshing the page resolved the issue.以某种方式刷新页面解决了问题。 driver.navigate().refresh(); driver.navigate().refresh();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM