简体   繁体   English

HTML解析器,用于用Java进行词组和大小写敏感的搜索

[英]HTML parser for phrase and case sensitive searches, in Java

I would like to know if there are any HTML Parsers in Java that would support phrase and case sensitive searches. 我想知道Java中是否有支持短语和大小写敏感搜索的HTML解析器。 All I need to know is number of hits in a html page for searched phrase and support for case sensitivity. 我只需要知道html页中搜索词组的点击次数并支持区分大小写。

Thanks, Sharma 谢谢,夏尔马

Have you tried this ? 你尝试过这个吗?

You can search the text using regular expressions. 您可以使用正则表达式搜索文本。

does not it help, if you take html page as text, strip html tags: 这没有帮助,如果您将html页面作为文本,请删除html标签:

String noHTMLString = htmlString.replaceAll("\\<.*?\\>", "");

and now count what you need in noHTMLString ? 现在计算 noHTMLString中需要的内容? It could be helpful, if you have html page with markup like: 如果您有带有标记的html页面,则可能会有所帮助:

this is <span>cool</span>

and you need to look for text "is cool" (because prev html page will be transformed into "this is cool" string). 并且您需要查找文本“很酷”(因为上一页html页面将转换为“这很酷”字符串)。 To count you can use StringUtils from Apache Commons Lang , it has special method called countMatches . 要进行计数,可以使用Apache Commons Lang的 StringUtils,它具有一种称为countMatches的特殊方法。 Everything together should work as: 一切都应按以下方式工作:

String htmlString = "this is <span>cool</span>";    
String noHTMLString = htmlString.replaceAll("\\<.*?\\>", "");
int count = StringUtils.countMatches( noHTMLString, "is cool");

I would go with that approach, at least give it a try. 我会采用这种方法,至少尝试一下。 It sounds better than parsing html, and then traversing it looking for words you need... 听起来比解析html更好,然后遍历它以寻找所需的单词...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM