简体   繁体   English

如果在网页中进行Ajax请求或使用Selenium Web驱动程序拦截XMLHttpRequest,则使用Java Script进行跟踪

[英]Tracking with Java Script if Ajax request is going on in a webpage or Intercept XMLHttpRequest through Selenium Web driver

I am using Selenium WebDriver for crawling a web site (only for example, I will be crawling other web sites too!) which has infinite scroll. 我正在使用Selenium WebDriver来抓取一个网站 (例如,我也将抓取其他网站!),它具有无限滚动。

Problem statement: 问题陈述:

Scroll down the infinite scroll page till the content stops loading using Selenium web driver. 向下滚动无限滚动页面,直到内容停止使用Selenium Web驱动程序加载。

My Approach: Currently I am doing this- 我的方法:目前我这样做 -

Step 1: Scroll to the page bottom 第1步:滚动到页面底部

JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("javascript:window.onload=toBottom();"+
                        "function toBottom(){" +
                        "window.scrollTo(0,Math.max(document.documentElement.scrollHeight," +
                        "document.body.scrollHeight,document.documentElement.clientHeight));" +
                "}");

Then I wait for some time to let the Ajax Request complete like this- 然后我等待一段时间让Ajax请求像这样完成 -

Step 2: Explicitly wait for Ajax request to be over 第2步:明确等待Ajax请求结束

Thread.sleep(1000); 了Thread.sleep(1000);

Then I give another java script to check if the page is scrollable 然后我给另一个java脚本来检查页面是否可滚动

Step 3:Check if the page is scrollable 第3步:检查页面是否可滚动

//Alternative to document.height is to be used which is document.body.clientHeight
//refer to https://developer.mozilla.org/en-US/docs/DOM/document.height

    if((Long)js.executeScript("return " +
                                "(document.body.clientHeight-(window.pageYOffset + window.innerHeight))")>0)

If the above condition is true then I repeat the from Step 1 - 3, till condition in Step 3 is false. 如果上述条件为真,那么我重复步骤1 - 3,直到步骤3中的条件为假。

The Problem: I do not want to give the Thread.sleep(1000); 问题:我不想给Thread.sleep(1000); in step 2, rather I would like to check using Java Script if the background Ajax request is over and then scroll down further if the condition in Step 3 is true . 在步骤2中,我想在后台Ajax请求结束时使用Java Script检查,如果步骤3中的条件为真,则进一步向下滚动。

PS: I am not the developer of the page so I do not have access to the code running the page, I can just inject java scripts(as in Step 1 and 3) in the web page. PS:我不是页面的开发者,因此我无法访问运行页面的代码,我可以在网页中注入java脚本(如步骤1和3中所示)。 And, I have to write a generic logic for any web site with Ajax requests during infinite scroll. 并且,我必须在无限滚动期间为任何具有Ajax请求的网站编写通用逻辑。

I will be grateful to some one could spare some time here! 我将感激有些人可以在这里休息一下!

EDIT : Ok, after struggling for 2 days, I have figured out that the pages which I am crawling through the Selenium WebDriver can have any of these JavaScript libraries and I will have to pool according to the different Library, for example, In case of the web application using jQuery api, I may be waiting for 编辑:好的,经过2天的努力,我发现我通过Selenium WebDriver抓取的页面可以包含任何这些JavaScript库 ,我将不得不根据不同的库进行池化,例如,使用jQuery api的web应用程序,我可能正在等待

(Long)((JavascriptExecutor)driver).executeScript("return jQuery.active")

to return a zero. 返回零。

Likewise if the web application is using the Prototype JavaScript library I will have to wait for 同样,如果Web应用程序使用Prototype JavaScript库,我将不得不等待

(Long)((JavascriptExecutor)driver).executeScript("return Ajax.activeRequestCount")

to return a zero. 返回零。

Now , the problem is how do I write a generic code which could handle most the JavaScript libraries available? 现在问题是如何编写可以处理大多数可用JavaScript库的通用代码?

Problem I am facing in implementing this- 我在实施这个问题时面临的问题 -

1 . 1 How do I find which JavaScript Library is being used in the Web Application(using Selenium WebDriver in Java), such that I can then write the corresponding wait methods? 如何找到Web应用程序中使用的JavaScript库(使用Java中的Selenium WebDriver),以便我可以编写相应的等待方法? Currently, I am using this 目前,我正在使用它

Code

2 . 2 This way I will have to write as many as 77 methods for separate JavaScript library so, I need a better way to handle this scenario as well. 这样我就不得不为单独的JavaScript库编写多达77种方法,因此,我还需要一种更好的方法来处理这种情况。

In short, I need to figure out if the browser is making any call(Ajax or simple) with or without any JavaScript library through Selenium Web Driver's java implementation 简而言之,我需要通过Selenium Web Driver的java实现来确定浏览器是否使用或不使用任何JavaScript库进行任何调用(Ajax或简单)

PS: there are Add ons for Chorme's JavaScript Lib detector and Firefox's JavaScript Library detector which detect the JavaScript library being used. PS:Chorme的JavaScript Lib探测器和Firefox的JavaScript Library探测器都有附加功能,可以检测正在使用的JavaScript库。

For web pages with Ajax Response during the infinite scroll and using jQuery API(or other actions), before starting to opening the web page. 对于在无限滚动期间使用Ajax Response并使用jQuery API(或其他操作)的网页,在开始打开网页之前。

    //Inject the pooling status variable
    js.executeScript("window.status = 'fail';");

    //Attach the Ajax call back method
    js.executeScript( "$(document).ajaxComplete(function() {" +
    "status = 'success';});");

Step 1: will remain the same as in the original question 第1步:将保留原始问题中的相同内容

Step 2 Pooling the following script(This is the one which removes the need of Thread.Sleep() and makes the logic more dynamic) 步骤2汇集以下脚本(这是不需要Thread.Sleep()并使逻辑更动态的脚本)

String aStatus = (String)js.executeScript("return status;");

                        if(aStatus!=null && aStatus.equalsIgnoreCase("success")){
                            js.executeScript("status = 'fail';");
                            break poolingLoop;
                        }

Step 3: No need now! 第3步:现在不需要!

Conclusion: No need to give blunt Thread.sleep(); 结论:无需提供钝的Thread.sleep(); again and again while using Selenium WebDriver!! 使用Selenium WebDriver时一次又一次!!

This approach works good only if there's jQuery api being used in the web application. 只有在Web应用程序中使用jQuery api时,此方法才有效。

EDIT: As per the the link given by @jayati i injected the javascript- 编辑:根据@jayati给出的链接,我注入了javascript-

Javascript one: Javascript一:

//XMLHttpRequest instrumentation/wrapping
var startTracing = function (onnew) {
    var OldXHR = window.XMLHttpRequest;

    // create a wrapper object that has the same interfaces as a regular XMLHttpRequest object
    // see http://www.xulplanet.com/references/objref/XMLHttpRequest.html for reference on XHR object
    var NewXHR = function() {
        var self = this;
        var actualXHR = new OldXHR();

        // private callbacks (for UI):
        // onopen, onsend, onsetrequestheader, onupdate, ...
        this.requestHeaders = "";
        this.requestBody = "";

        // emulate methods from regular XMLHttpRequest object
        this.open = function(a, b, c, d, e) { 
            self.openMethod = a.toUpperCase();
            self.openURL = b;
            ajaxRequestStarted = 'open';

            if (self.onopen != null && typeof(self.onopen) == "function") { 
                self.onopen(a,b,c,d,e); } 
            return actualXHR.open(a,b,c,d,e); 
        }
        this.send = function(a) {
            ajaxRequestStarted = 'send';

            if (self.onsend != null && typeof(this.onsend) == "function") { 
                self.onsend(a); } 
            self.requestBody += a;
            return actualXHR.send(a); 
        }
        this.setRequestHeader = function(a, b) {
            if (self.onsetrequestheader != null && typeof(self.onsetrequestheader) == "function") { self.onsetrequestheader(a, b); } 
            self.requestHeaders += a + ":" + b + "\r\n";
            return actualXHR.setRequestHeader(a, b); 
        }
        this.getRequestHeader = function() {
            return actualXHR.getRequestHeader(); 
        }
        this.getResponseHeader = function(a) { return actualXHR.getResponseHeader(a); }
        this.getAllResponseHeaders = function() { return actualXHR.getAllResponseHeaders(); }
        this.abort = function() { return actualXHR.abort(); }
        this.addEventListener = function(a, b, c) { return actualXHR.addEventListener(a, b, c); }
        this.dispatchEvent = function(e) { return actualXHR.dispatchEvent(e); }
        this.openRequest = function(a, b, c, d, e) { return actualXHR.openRequest(a, b, c, d, e); }
        this.overrideMimeType = function(e) { return actualXHR.overrideMimeType(e); }
        this.removeEventListener = function(a, b, c) { return actualXHR.removeEventListener(a, b, c); }

        // copy the values from actualXHR back onto self
        function copyState() {
            // copy properties back from the actual XHR to the wrapper
            try {
                self.readyState = actualXHR.readyState;
            } catch (e) {}
            try {
                self.status = actualXHR.status;
            } catch (e) {}
            try {
                self.responseText = actualXHR.responseText;
            } catch (e) {}
            try {
                self.statusText = actualXHR.statusText;
            } catch (e) {}
            try {
                self.responseXML = actualXHR.responseXML;
            } catch (e) {}
        }

        // emulate callbacks from regular XMLHttpRequest object
        actualXHR.onreadystatechange = function() {
            copyState();

            try {
                if (self.onupdate != null && typeof(self.onupdate) == "function") { self.onupdate(); } 
            } catch (e) {}

            // onreadystatechange callback            
            if (self.onreadystatechange != null && typeof(self.onreadystatechange) == "function") { return self.onreadystatechange(); } 
        }
        actualXHR.onerror = function(e) {

            ajaxRequestComplete = 'err';
            copyState();

            try {
                if (self.onupdate != null && typeof(self.onupdate) == "function") { self.onupdate(); } 
            } catch (e) {}

            if (self.onerror != null && typeof(self.onerror) == "function") { 
                return self.onerror(e); 
            } else if (self.onreadystatechange != null && typeof(self.onreadystatechange) == "function") { 
                return self.onreadystatechange(); 
            }
        }
        actualXHR.onload = function(e) {

            ajaxRequestComplete = 'loaded';
            copyState();

            try {
                if (self.onupdate != null && typeof(self.onupdate) == "function") { self.onupdate(); } 
            } catch (e) {}

            if (self.onload != null && typeof(self.onload) == "function") { 
                return self.onload(e); 
            } else if (self.onreadystatechange != null && typeof(self.onreadystatechange) == "function") { 
                return self.onreadystatechange(); 
            }
        }
        actualXHR.onprogress = function(e) {
            copyState();

            try {
                if (self.onupdate != null && typeof(self.onupdate) == "function") { self.onupdate(); } 
            } catch (e) {}

            if (self.onprogress != null && typeof(self.onprogress) == "function") { 
                return self.onprogress(e);
            } else if (self.onreadystatechange != null && typeof(self.onreadystatechange) == "function") { 
                return self.onreadystatechange(); 
            }
        }

        if (onnew && typeof(onnew) == "function") { onnew(this); }
    }

    window.XMLHttpRequest = NewXHR;

}
window.ajaxRequestComplete = 'no';//Make as a global javascript variable
window.ajaxRequestStarted = 'no';
startTracing();

Or Javascript Two: Javascript二:

var startTracing = function (onnew) {
    window.ajaxRequestComplete = 'no';//Make as a global javascript variable
    window.ajaxRequestStarted = 'no';

    XMLHttpRequest.prototype.uniqueID = function() {
        if (!this.uniqueIDMemo) {
            this.uniqueIDMemo = Math.floor(Math.random() * 1000);
        }
        return this.uniqueIDMemo;
    }

    XMLHttpRequest.prototype.oldOpen = XMLHttpRequest.prototype.open;

    var newOpen = function(method, url, async, user, password) {

        ajaxRequestStarted = 'open';
        /*alert(ajaxRequestStarted);*/
        this.oldOpen(method, url, async, user, password);
    }

    XMLHttpRequest.prototype.open = newOpen;

    XMLHttpRequest.prototype.oldSend = XMLHttpRequest.prototype.send;

    var newSend = function(a) {
        var xhr = this;

        var onload = function() {
            ajaxRequestComplete = 'loaded';
            /*alert(ajaxRequestComplete);*/
        };

        var onerror = function( ) {
            ajaxRequestComplete = 'Err';
            /*alert(ajaxRequestComplete);*/
        };

        xhr.addEventListener("load", onload, false);
        xhr.addEventListener("error", onerror, false);

        xhr.oldSend(a);
    }

    XMLHttpRequest.prototype.send = newSend;
}
startTracing();

And checking the status of the status vars ajaxRequestStarted, ajaxRequestComplete in the java code, one can determine if the ajax was started or completed. 并且在java代码中检查状态变量ajaxRequestStarted, ajaxRequestComplete的状态,可以确定ajax是否已启动或已完成。

Now I have a way to wait till an Ajax is complete, I can also find if the Ajax was triggered on some action 现在我有办法等到Ajax完成,我还可以找到Ajax是否在某个动作上被触发

Approach 1 : 方法1

Your approach is good, just a few changes would do the trick: 你的方法很好,只需做一些改动即可:

Step 1: Improve this step to call the toBottom function at regular interval using window.setInterval . 步骤1:改进此步骤以使用window.setInterval定期调用toBottom函数。 At (c >= totalcount) call window.clearInterval (c >= totalcount)调用window.clearInterval

Setp 2: Instead of checking the page is yet scrollable, check if (c >= totalcount) . Setp 2:检查是否(c >= totalcount) ,而不是检查页面是否可滚动。 And this condition every 200ms until (c >= totalcount) returns true. 并且这个条件每200ms直到(c >= totalcount)返回true。

FYI: If the Step 1 doesn't work in all the browsers then probably, you can refer to line 5210 of Tata-Nano-Reviews-925076578.js and call this with c variable checking. 仅供参考:如果步骤1在所有浏览器中都不起作用,那么您可以参考Tata-Nano-Reviews-925076578.js的5210行,并使用c变量检查调用它。

Approach 2 : 方法2

Go to jQuery API and type "ajax". 转到jQuery API并输入“ajax”。 You can find some callback handlers which could be used for ajax requests. 您可以找到一些可用于ajax请求的回调处理程序。

Probably, set a variable before the request is been sent and after it is been received appropriately. 可能在请求发送之前和适当接收之后设置变量。

And in between use your original method of scrolling to bottom at regular interval, unless you can no more scroll. 在两者之间使用你原来的定期滚动到底部的方法,除非你不能再滚动。 At this point clear the interval variable. 此时清除区间变量。

Now, regularly check if that interval variable is null or not. 现在,定期检查该间隔变量是否为空。 Null would mean that you have reached the bottom. Null意味着你已达到底部。

We had to solve the same problem, and managed using a long Javascript function. 我们必须解决相同的问题,并使用长Javascript函数进行管理。 Just need to add checks to see which library is not undefined. 只需要添加检查以查看哪个库未定义。

PS Thanks for giving me an easy answer for how to check for in progress Prototype requests! PS感谢您给我一个简单的答案,如何检查正在进行原型请求!

eg. 例如。 Handle JQuery and XHR/Prototype 处理JQuery和XHR / Prototype

var jsExecutor = /*Get your WebDriverInstance*/ as IJavaScriptExecutor;
while(/*your required timeout here*/)
{
    var ajaxComplete = 
      jsExecutor.ExecuteScript("return ((typeof Ajax === 'undefined') ||   
      Ajax.activeRequestCount == 0) && ((typeof jQuery === 'undefined') || $.active == 0)");
    if (ajaxIsComplete)
      return
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM