简体   繁体   中英

Dynamically loading external javascript resources in Splash HTTP rendering service

I'm currently using Splash HTTP API as a headless browser to render request. I'm using the endpoint render.html with js_source which allows the evaluation of custom Javascript code within the page context, after the page finishes loading, and before the page is rendered.

I'm in need of making additional requests to external resources, such as loading jQuery after the page is loaded.

var script = document.createElement('script');
script.type = 'text/javascript';
script.src = "https://code.jquery.com/jquery-1.5.1.min.js";
document.getElementsByTagName('head')[0].appendChild(script);

The problem is that by doing such, the objects do not become available within the page context. The script does appear to be added within the HEAD element of the final rendered HTML source:

<script type="text/javascript" src="https://code.jquery.com/jquery-1.5.1.min.js"></script>

I tried setting a callback using both methods described below to make sure the script is loaded before accessing any of jQuery 's methods. But the callback fails to be invoked in both scenarios.

script.onreadystatechange = callback;
script.onload = callback;

Running the aforementioned scripts in Chrome 's console does what I require and immediately makes the jQuery resource available within the page context.

Probably you can make it work with js_source , but as a feature js_source is quite limited; /execute endpoint and a custom Lua script are much more versatile, and often easier to use:

function main(splash)
    splash:autoload("https://code.jquery.com/jquery-1.5.1.min.js")
    assert(splash:go(splash.args.url))
    assert(splash:wait(1.0))
    splash:runjs(splash.args.js_source)
    return splash:html()
end

^^ this script emulates render.html endpoint, but preloads jQuery; it supports 'url' and 'js_source' arguments and hardcodes 'wait' to 1.0.

From what I see, the autoload command from /execute endpoint, downloads the resources asynchronously through a HTTP GET in Python, https://github.com/scrapinghub/splash/blob/master/splash/qtrender_lua.py#L898 , then casually evaluates the js in a similar fashion as js_source does. https://github.com/scrapinghub/splash/blob/master/splash/browser_tab.py#L655

So there is no way of adding/downloading external resources from within the browser context, as you would usually do with a HTML script resource :(

I finally managed to solve the issue, it looks like Splash doesn't re-evaluate the DOM upon changes, the thing that worked for me was to make a synchronous XMLHttpRequest to the resource and the evaluate the response:

src = "https://code.jquery.com/jquery-1.5.1.min.js";

var request = new XMLHttpRequest();
request.open('GET', src, false);
request.send(null);

if (request.status === 200) {
    eval(request.responseText);
};

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM