简体   繁体   English

使用纯String方法的Java Parser HTML?

[英]Java Parser HTML using plain String methods?

Is it a good idea? 这是个好主意吗? Well I have used other 3rd party Libraries like JSoup and it works great, but for this project it's different. 好吧,我曾经使用过像JSoup这样的其他3rd Party库,它的工作原理很棒,但是对于这个项目却有所不同。 Is it worth it to load and parse a whole document when you just want to get one item from it? 当您只想从其中获取一项内容时,加载并解析整个文档是否值得? Some of the html pages are simple too, so I could use String methods too. 一些html页面也很简单,因此我也可以使用String方法。 Reason is cause memory will be an issue, and it also takes some time to load the document too. 原因是内存将成为问题,并且加载文档也需要一些时间。 When parsing XML I always use a SAX Parser because it doesn't load it in memory and it is fast. 解析XML时,我总是使用SAX解析器,因为它不会将其加载到内存中并且速度很快。 Could I use the same thing on html documents, or is there already one like this out there? 我可以在html文档上使用相同的东西,还是已经有类似的东西了? So if there is a non-DOM HTML lightweight parser, that would be great too. 因此,如果有一个非DOM HTML轻量级解析器,那也很好。

If the HTML is XML compliant (ie it's XHTML) then you can use a standard SAX parser. 如果HTML符合XML(即XHTML),则可以使用标准的SAX解析器。 Here you can find a list of HTML parsers in Java to choose from: http://java-source.net/open-source/html-parsers . 在这里,您可以找到Java中的HTML解析器列表,以供选择: http : //java-source.net/open-source/html-parsers HotSax probably will handle all your use cases. HotSax可能会处理您的所有用例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM