简体   繁体   English

如何从此casperjs代码中消除堆栈溢出(使用setTimeout)?

[英]How do I remove the stack overflow from this casperjs code (using setTimeout)?

The following sample resembles my actual code: 下面的示例类似于我的实际代码:

function runCode() {
    casper.then(function(){
        if (condition){
            return;
        }
    });

    .... code .....
    .... code .....

    casper.then(function(){
        setTimeout(runCode(), 1000);
    });
}

function startScript() {
    .... code ....
    .... code ....

    casper.then(function(){
        runCode();
    });

    casper.then(function(){
        setTimeout(startScript(),5000);
    });
}

startScript();

This code is running on a vps and it seems to fill up all the 512 MB of RAM. 这段代码在vps上运行,似乎填满了所有512 MB RAM。 It initially starts with around 50 MB RAM and in few hours goes on to fill it up. 它最初以大约50 MB的RAM开始,然后在数小时内将其填满。 So I suspect the way I'm implementing the infinite loop is creating new stack frames without destroying the old ones. 因此,我怀疑实现无限循环的方式是在不破坏旧堆栈框架的情况下创建新堆栈框架。

How I want to implement this : The execution starts with startScript() and from inside the startScript() it calls another function runCode() . 我要如何实现:startScript()开始执行,然后从startScript()内部调用另一个函数runCode() This runCode function has to run infinitely in a loop. runCode函数必须无限循环运行。 I'm trying to do it using the setTimeout function. 我正在尝试使用setTimeout函数。

There is a condition upon reaching which the whole script to start again so I'm using return and go back to startScript() function and then restart it with another setTimeout() function. 在达到某个条件后,整个脚本将再次启动,因此我使用return并返回到startScript()函数,然后使用另一个setTimeout()函数重新启动它。

The specific condition I'm talking about has not been encountered in my script in the last few hours. 在过去的几个小时里,我的脚本没有遇到我所谈论的特定条件。 So, I suspect the memory usage is within the runCode() function. 因此,我怀疑内存使用情况在runCode()函数中。 Please give me some suggestions to remove this memory usage problem. 请给我一些建议,以消除此内存使用问题。

Update : I was sending the function's return value (which was null or undefined) as argument to the setTimeout() and for this the function had to run once and this was causing the stackoverflow. 更新 :我正在将函数的返回值(为null或未定义)作为setTimeout()参数发送,为此,该函数必须运行一次,这会导致stackoverflow。 As suggested by Artjom B. , I tried the following code but the function passed as argument to the setTimeout is not being invoked. Artjom B.的建议,我尝试了以下代码,但未调用作为参数传递给setTimeout的函数。

function runCode() {

    console.log("inside runcode");
    casper.then(function(){
    ...
    ...
    // call to other functions
    });

    //setTimeout(runCode, 1000); --------------- [i]

    casper.then(function(){
        console.log("just before setTimeout");
        setTimeout(runCode, 1000);
    });
}
runCode();

I get the following output: 我得到以下输出:

inside runcode console.log messages from the other functions and codes in between. just before setTimeout inside runcode console.log messages from the other functions and codes in between. just before setTimeout Then it exits. inside runcode console.log messages from the other functions and codes in between. just before setTimeout然后退出。

If I use the commented out code as indicated by [i] and comment out the lines after that. 如果我使用[i]所示的注释掉的代码,然后注释掉之后的行。 I get an infinite loop like this: inside runcode inside runcode inside runcode .... .... I don't know what is wrong. 我得到这样一个无限循环: inside runcode inside runcode inside runcode .... ....我不知inside runcode inside runcode inside runcode .... ....什么问题了。 Please suggest me something. 请给我一些建议。

Update 2: Thank you Artjom B. for picking up another flaw in my code . 更新2:感谢Artjom B.发现 我的代码中的另一个缺陷 There seems to be a problem with the setTimeout() function. setTimeout()函数似乎有问题。 When I run the code in this paste: http://pastebin.com/W9DD6YpB , it doesn't seem to run infinitely as supposed. 当我在此粘贴中运行代码时: http : //pastebin.com/W9DD6YpB ,它似乎并没有像预期的那样无限运行。

Update 3: As explained by Artjom B. , the asynchronous nature of javascript is causing casper to think there is no more code left to execute so it is exiting before the function queued by setTimeout gets invoked. 更新3:正如Artjom B.所解释的那样 ,javascript的异步特性使casper认为没有更多代码可执行,因此在调用setTimeout排队的函数之前退出。 I'm wondering if adding some code after will make casper not exit. 我想知道是否在之后添加一些代码会使Casper不会退出。 For example, function queued by setTimeout() waits for 1000ms to be invoked. 例如,由setTimeout()排队的函数等待调用1000ms。 So, a casper.wait(2000) should do the work but I don't know if there will still be stack overflow problems: http://pastebin.com/ybKWH5KX 因此,一个casper.wait(2000)应该可以完成工作,但是我不知道是否还会出现堆栈溢出问题: http : //pastebin.com/ybKWH5KX

After some discussion in the comments, it was made clear that an approach with setTimeout doesn't work or is rather hard to read and maintain. 在评论中进行了一些讨论之后,很明显的是,使用setTimeout的方法不起作用,或者很难阅读和维护。

Stack frames 堆叠框架

Your concern for uncollected stack frames from recursive calling of runCode and startScript is ungrounded since CasperJS internally works with setTimeout . 由于CasperJS在内部与setTimeout runCodestartScript您对递归调用runCodestartScript引起的未收集堆栈帧的startScript是没有根据的。 So you should use the functions that are provided by CasperJS. 因此,您应该使用CasperJS提供的功能。

You can do this recursively (nesting of steps), because CasperJS handles this well using a queue and inserting new steps after the current executed step. 您可以递归地执行此操作(嵌套步骤),因为CasperJS使用队列很好地解决了这一问题,并在当前执行的步骤之后插入了新步骤。

Stop condition 停止条件

You would need to move the stop condition to the recursive call, because in such an asynchronous code this 您需要将停止条件移至递归调用,因为在这样的异步代码中,

function runCode() {
    casper.then(function(){
        if (condition){
            return;
        }
    });
    //...
}

doesn't actually stop runCode execution, because it just returns from the function inside of the then block. 实际上不会停止runCode执行,因为它只是从then块内部的函数返回。

Replace setTimeout 替换setTimeout

You would then replace setTimeout in: 然后,您将在以下位置替换setTimeout

function runCode() {
    //...
    casper.then(function(){
        if (!condition){
            setTimeout(runCode, 1000);
        }
    });
}

with the proper casper functions: 具有适当的卡斯珀功能:

function runCode() {
    //...
    casper.wait(1000);
    casper.then(function(){
        if (!condition){
            runCode();
        }
    });
}

You need to do the same replacement in startScript from this: 您需要在startScript中执行以下操作:

casper.then(function(){
    setTimeout(startScript,5000);
});

to

casper.wait(5000);
casper.then(function(){
    startScript();
});

On keeping setTimeout 关于保持setTimeout

If you really want to keep setTimeout then you would need to do double bookkeeping . 如果您真的想保留setTimeout则需要进行两次簿记 By calling a function with setTimeout you break out of the controlled flow of casper steps. 通过使用setTimeout调用函数,您可以摆脱Casper步骤的控制流。

For example, you may do something like this: 例如,您可以执行以下操作:

function someFunction(){
    casper.then(function(){
        // something
    });
}
casper.start(url);
casper.then(function(){
    setTimeout(someFunction, 5000);
});
casper.run();

The function inside then is actually the last scheduled step. 内部的功能then实际上是最后一班一步。 When it is executed it will create a timer to then start a function which in turn will add more steps to the flow. 当它执行时,将创建一个计时器,然后启动一个功能,该功能又将向流程添加更多步骤。 This will never happen, because casper has no way of knowing if there will be more steps scheduled and since there currently aren't (at the end of the then before run ), it will simply exit the complete script. 这永远不会发生,因为卡斯帕无法知道是否会有计划的更多步骤的方式,因为当前有没有(在的结束thenrun ),它只会退出完整的脚本。 Although on some platforms the underlying phantomjs might behave differently. 尽管在某些平台上,底层phantomjs的行为可能有所不同。 setTimeout lets you break out of the control flow. setTimeout使您可以脱离控制流。 This might not be good as in this case. 在这种情况下,这可能不是很好。

To gain control back you may do the following as indicated in your paste : 要获得控制权,您可以按照粘贴中的指示执行以下操作:

function someFunction(){
    casper.then(function(){
        // something
    });
}
casper.start(url);
casper.then(function(){
    setTimeout(someFunction, 5000);
});
casper.wait(5100); // should be greater than the previous timeout
casper.run();

^ Do not do this. ^不要这样做。 It is hard to read and error-prone. 很难阅读且容易出错。 This can be simplified to: 可以简化为:

casper.start(url);
casper.then(function(){
    // something
});
casper.wait(5000, someFunction); // added bonus because "this" now refers to casper
casper.run();

Proper callback invocation for setTimeout 正确的setTimeout回调调用

You also have a syntactic problem with the actual invocation of the function in setTimeout . setTimeout实际调用该函数还存在语法问题。 The main problem is that you don't actually use setTimeout . 主要的问题是您实际上并没有使用setTimeout See for example the line 例如查看行

setTimeout(startScript(),5000);

Here you invoke the startScript function without delay, because of () and pass the return value into the setTimeout function. 在这里,您会因为()而立即调用startScript函数,并将返回值传递给setTimeout函数。 I don't think you actually return anything from startScript . 我认为您实际上没有从startScript返回任何startScript setTimeout will take the undefined without issuing a warning or error, but can't execute it after the timeout, because it isn't actually a function. setTimeout将采用undefined而不发出警告或错误,但是在超时后无法执行,因为它实际上不是函数。 In javascript functions are first class citizens. 在javascript函数中是一等公民。 You can pass the function object into other functions. 您可以将函数对象传递给其他函数。

You can fix this by removing () from the above line: 您可以通过删除以上行中的()来解决此问题:

setTimeout(startScript,5000);

The same goes for 同样的道理

setTimeout(runCode, 1000);

(untested) Solution for removing previous casper steps (未测试)删除先前的casper步骤的解决方案

You really should run the script from cron without the recursion or something like that. 您确实应该从cron运行脚本而不进行递归或类似操作。 If you really don't want that, you still may be able to reduce the memory consumption. 如果您确实不希望这样做,则仍然可以减少内存消耗。

The steps that are scheduled via then* , wait* and some other are managed in the internal casper.steps property. 通过then*wait*和其他一些计划的步骤在内部 casper.steps属性中进行管理。 They are not cleared once they are executed. 它们一旦执行就不会清除。 So that may be the reason of your memory leak. 因此,这可能是内存泄漏的原因。 You may try to clear them like this: 您可以尝试像这样清除它们:

casper.clearSomeSteps = function(min, keep){
    var len = casper.steps.length;
    min = min || 1000; // only run when at least 1000 steps are scheduled
    keep = keep || 100; // keep 100 of the newer steps
    if (len < min) return; // not yet needed

    this.step -= len-keep; // change the index of the current step
    this.steps = Array.prototype.slice.call(this.steps, len-keep); // do the slice
};

Call this.clearSomeSteps() at the beginning of startScript . startScript的开头调用this.clearSomeSteps() Although this might not be the whole solution as there are also casper.waiters . 尽管这可能不是整体解决方案,因为还有casper.waiters

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM