简体   繁体   English

如何从安全URL(https://)中提取HTML代码?

[英]How can i extract HTML code from secure URLs (https://)?

I used JSoup library im not getting full html content from secure URLs (https://) as the URL will get the content dynamically. 我使用JSoup库不是从安全URL(https://)获取完整的html内容,因为URL将动态获取内容。 Is there any way to get the exact HTML content of secured URLs(https://). 有什么方法可以获取受保护的URL(https://)的确切HTML内容。

In order to parse the complete content you can use Selenium along with Jsoup. 为了解析完整的内容,您可以将Selenium与Jsoup一起使用。

WebDriver driver = new ChromeDriver();
driver.get("https://google.com/");
Document doc = Jsoup.parse(driver.getPageSource());

Or you can also wait for specific content to load.Shown below: 或者您也可以等待特定内容加载。如下所示:

public void waitForLoad(WebDriver driver) {
    ExpectedCondition<Boolean> pageLoadCondition = new
            ExpectedCondition<Boolean>() {
                public Boolean apply(WebDriver driver) {
                    return ((JavascriptExecutor)driver).executeScript("return document.readyState").equals("complete");
                }
            };
    WebDriverWait wait = new WebDriverWait(driver, 30);
    wait.until(pageLoadCondition);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM