简体   繁体   English

使用HtmlUnit进行抓取时出现OutOfMemoryError

[英]OutOfMemoryError while using HtmlUnit for scraping

I am using HtmlUnit to login on to a site and then download data from the table 我正在使用HtmlUnit登录到一个站点,然后从表中下载数据

When I run my code is is causing java.lang.OutOfMemoryError And could not run further. 当我运行我的代码时导致java.lang.OutOfMemoryError并且无法进一步运行。

Following is my code: 以下是我的代码:

WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_6);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setRedirectEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
                            webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setTimeout(50000);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setPopupBlockerEnabled(true);

HtmlPage htmlPage=webClient.getPage(url);
Thread.sleep(200);
                            //~~~~~~~Log-In
HtmlTextInput uname=(HtmlTextInput)htmlPage.getFirstByXPath("//*[@id=\"username\"]");
uname.setValueAttribute("xxx");
HtmlPasswordInput upass=(HtmlPasswordInput)htmlPage.getFirstByXPath("//*[@id=\"password\"]");
upass.setValueAttribute("xxx");
HtmlSubmitInput submit=(HtmlSubmitInput)htmlPage.getFirstByXPath("//*[@id=\"login-button\"]/input");
htmlPage=(HtmlPage) submit.click();
Thread.sleep(200);
webClient.waitForBackgroundJavaScript(10000);
for (int i = 0; i < 250; i++) {
 if (!htmlPage.asText().contains("Loading...")) {
     break;
  }
    synchronized (htmlPage) {
     htmlPage.wait(500);
 }
}

System.out.println(htmlPage.asText());

and Following is the stackTrace 以下是stackTrace

java.lang.OutOfMemoryError: Java heap space
at net.sourceforge.htmlunit.corejs.javascript.Node.newString(Node.java:155)
at net.sourceforge.htmlunit.corejs.javascript.Node.newString(Node.java:151)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.createPropertyGet(IRFactory.java:1990)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:968)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:964)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformPropertyGet(IRFactory.java:964)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:106)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformFunctionCall(IRFactory.java:595)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:86)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformInfix(IRFactory.java:775)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:161)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformAssignment(IRFactory.java:368)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:152)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformExprStmt(IRFactory.java:488)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:149)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:762)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:762)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformIf(IRFactory.java:768)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:110)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformBlock(IRFactory.java:406)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transform(IRFactory.java:82)
at net.sourceforge.htmlunit.corejs.javascript.IRFactory.transformFunction(IRFactory.java:560)

I have put following lines in catlina.sh file to allot heap memory But still I am getting the same error (My RAM size is 2GB). 我已经在catlina.sh文件中添加以下行来分配堆内存但是我仍然得到相同的错误(我的RAM大小为2GB)。

if [ -z "$LOGGING_MANAGER" ]; then
     JAVA_OPTS="$JAVA_OPTS -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager"
else
     JAVA_OPTS="$JAVA_OPTS $LOGGING_MANAGER"
fi

# Uncomment the following line to make the umask available when using the
# org.apache.catalina.security.SecurityListener
   JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.security.SecurityListener.UMASK=`umask`"
   JAVA_OPTS="$JAVA_OPTS  -Xms512m -Xmx2048m -XX:MaxPermSize=512m"
   JAVA_OPTS="-server -XX:+UseConcMarkSweepGC"

你在最后一行代码包含这个$ JAVA_OPTS ,你的代码可以运行

JAVA_OPTS="$JAVA_OPTS -server -XX:+UseConcMarkSweepGC"

我会设置-XX:+ HeapDumpOnOutOfMemoryError然后使用像Eclipse MAT这样的工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM