简体   繁体   English

Jsoup 停止解析网页

[英]Jsoup stops parsing a webpage

Jsoup.parse(String html) stops working. Jsoup.parse(String html) 停止工作。 I have an application when i use jsoup for few times to parse different pages, but when i want to parse a big page, jsoup just stops and that is all.当我多次使用 jsoup 解析不同的页面时,我有一个应用程序,但是当我想解析一个大页面时,jsoup 就停止了,仅此而已。 Does it have a limit or a maximum size of a page?它是否有页面的限制或最大尺寸?

java.lang.OutOfMemoryError
at java.lang.Object.internalClone(Native Method)
at java.lang.Object.clone(Object.java:82)
at java.lang.AbstractStringBuilder.append0(AbstractStringBuilder.java:172)
at java.lang.StringBuilder.append(StringBuilder.java:224)
at org.jsoup.parser.Tokeniser.emit(Tokeniser.java:76)
at org.jsoup.parser.TokeniserState$1.read(TokeniserState.java:26)
at org.jsoup.parser.Tokeniser.read(Tokeniser.java:42)
at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:101)
at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:53)
at org.jsoup.parser.Parser.parse(Parser.java:24)
at org.jsoup.Jsoup.parse(Jsoup.java:44)
...

EDIT: I took the substring of a page for some thousand first characters and then it managed to parse it.编辑:我将页面的 substring 获取了几千个第一个字符,然后它设法解析它。 So it seems that Jsoup has a limit of characters that it can manage.. Probably Datatype type is important here.因此,似乎 Jsoup 可以管理的字符数是有限的。可能数据类型在这里很重要。

EDIT: , EDIT: After analysing a little about what could be an error and trying to write my own HTML parser, which led to a lots of stress, i found out that Dalvik VM assigns only 4,3 MB on the Heap, which i assume is different from pc to pc.. Gonna try to increase it..编辑:编辑:在分析了一点可能是错误并尝试编写我自己的 HTML 解析器后,这导致了很大的压力,我发现 Dalvik VM 在堆上只分配了 4.3 MB,我假设是不同的电脑到电脑..要尝试增加它..

Try getting the page content with another method like HttpClient and then call尝试使用 HttpClient 等其他方法获取页面内容,然后调用

Jsoup.parse(String html);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM