简体   繁体   English

在检索其内容之前执行页面脚本

[英]Executing page scripts before retrieving it's contents

I've a page where i need to automate some tasks and scrape some data, but the page runs some JS after loading to inject some data into the DOM;我有一个页面,我需要在其中自动执行一些任务并抓取一些数据,但该页面在加载后运行一些 JS 以将一些数据注入 DOM; that i cannot intercept (not in a good format anyway), I was hoping to find a solution that is fast and not memory consuming.我无法拦截(无论如何都不是一个好的格式),我希望找到一个快速且不消耗内存的解决方案。

I've attempt to get the scripts myself and execute them using some headless driver (namely phantomJs) but it didn't update the page source and i'm not sure how to retrive the updated DOM from that我试图自己获取脚本并使用一些无头驱动程序(即 phantomJs)执行它们,但它没有更新页面源,我不知道如何从中检索更新的 DOM

var page = GetWebPage(url);
var scripts = page.Html.QuerySelectorAll("script");

var phantomDriver = new PhantomJSDriver(PhantomJSDriverService.CreateDefaultService(Directory.GetCurrentDirectory()));
phantomDriver.Navigate().GoToUrl(url);

foreach (var script in scripts)
    phantomDriver.ExecuteScript(script.InnerText);

var at = phantomDriver.PageSource;

You could use a 'wait'.您可以使用“等待”。 According to this link , Selenium has both implicit and explicit waits.根据此链接,Selenium 具有隐式和显式等待。 The example below is using an explicit wait.下面的示例使用显式等待。

To use an explicit wait, use WebDriverWait and ExpectedConditions .要使用显式等待,请使用WebDriverWaitExpectedConditions I'm not sure what language you're using but here's an example in python.我不确定您使用的是什么语言,但这里有一个 Python 示例。 This uses WebDriverWait in a try-catch block, allowing timeout seconds to meet the specified ExpectedConditions .这在 try-catch 块中使用WebDriverWait ,允许timeout秒数满足指定的ExpectedConditions As at June 2019, conditions are available in:截至 2019 年 6 月,条件适用于:

Example code in python: python中的示例代码:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

url = 'https://stackoverflow.com/questions/56724178/executing-page-scripts-before-retrieving-its-contents'
target = (By.XPATH, "//div[@class='gravatar-wrapper-32']")
timeout = 20  # Allow max 20 seconds to find the target

browser = webdriver.Chrome()
browser.get(url)
try:
    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located(target))
except TimeoutException:
    print("Timed out waiting for page to load")
    browser.quit()

The important bit is between try and except which you would modify to use the specific 'expected condition' you're interested in.重要的一点是在tryexcept之间,您可以修改它以使用您感兴趣的特定“预期条件”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM